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20.  ABSTRACT  (Continue  on  reverae  aide  If  neceeaary  end  Identify  by  block  number) 

Design  of  input  signals  to  enchance  the  estimation  on  unknown  parameters  in 
discrete  time  dynamics  is  considered.  The  system  identification  problem  can 
be  considered  as  the  initial  phase  of  a stochastic  control  problem  of  a dynamic 
system.  In  such  a situation  it  is  desirable  to  estimate  the  parameters  as  rapidl 
as  possible  without  disturbing  the  normal  operationcf  the  process.  The  problem 
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is  represented  in  a JSaysian  frameworkjwhere  the  a posteriori  probability 
P(0  | ZN)  contains ythe  information  supplied  by  the  observations  about  the 
parameters.  C^The  optimality  criterion  for  the  synthesis  problem  is  defined 
in  terms  of  the  statistical  distance  measure  between  distributions 
characterized  by  distinct  parameter  values.  In  this  dissertation  we  define 
this  quantity  in  terms  of  the  pairwise  Bhattacharyya  distance.  The  distance 
measure  has  the  desirable  property  of  ordering  the  Bayes1  probability  of 
error  as  a function  of  two  experimental  design  programs.  The  statistical 
distance  measures  have  been  successfully  used  in  studying  taxonomy  and 
characterization  of  racial  distributions.  It  is  shown  that  such  an  approach 
is  also  feasible  in  applications  to  more  general  dynamic  situation.  ^ 


Several  useful  properties  of  the  B -distance  are  derived.  Its 
relationship  to  the  Fisher  information  is  obtained,  where  the  differential 
metric  has  an  interpretation  in  terms  of  the  sensitivity  of  the  observations 
with  respect  to  the  parameters.  A lower  bound  on  the  mean  square  error 
is  derived  using  the  relationship  between  the  distance  measure  and  the 
mutual  information.  Application  to  dynamic  systems  is  made  via  the 
adaptive  filtering  formulation  where  the  parameters  have  a finite  set  of 
discrete  values  and  the  estimation  is  performed  in  a parallel  processing 
scheme. 

An  alternative  approach  to  the  input  synthesis  problem  is  obtained 
by  generating  these  signals  from  linear  .stochastic  processes  of  the  auto- 
regressive moving  average  type.  This  approach  to  the  selection  of  random 
inputs  gives  the  additional  degrees  of  freedom  of  choosing  the  parameters  of 
the  input  process,  maximizing  the  cost  function.  The  asymptotic  property 
of  the  input  correlation  matrix  is  stuuieu  using  uie  asymptotic  eigenvalue 
distribution  of  Toeplitz  matrices.  When  the  variance  of  the  white  noise 
generating  the  linear  inputs  is  constrained,  it  is  shown  that  among  the  class 
of  all  linear  inputs  that  input  for  which  the  oytim^lity  criterion  is  maximum 
also  has  the  largest  value  for  the  intergral  — f (X)  d^ where 

2 Z TT  - 

^uu  = ^uu  (^)/CTy  with  S (^)  the  spectrum  of  corresponding  stationary 
linear  process. 

The  effect  of  feedback  on  the  input  sequence  has  been  studied  in 
terms  of  the  open-loop  feedback  formulation.  At  each  stage  k,  of  the 
dynamic  process,  future  inputs  are  selected  based  on  the  past  and  present 
observation  program,  and  the  first  component  of  this  subsequence,  U , 
is  selected  as  the  input  u^.  The  feedback  is  introduced  when  an  ~ ^ 
additional  observation  is  incorporated  in  computing  the  future  inputs.  Tt 
is  shown  here,  thatthe  feedback  at  stage  k,  can  be  represented  as  the  sum 
of  open-loop  input  at  stage  k and  a correction  term. 
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ABSTRACT  OF  TIIE  DISSERTATION 


Synthesis  of  Input  Signals  in 
Parameter  Estimation  Problems 

by 

Belle  Raghavendra  Upadhyaya 
Doctor  of  Philosophy  in  Engineering  Sciences 
University  of  California,  San  Diego,  1975 
Professor  Harold  W.  Sorenson,  Chairman 

Design  of  input  signals  to  enhance  the  estimation  of  unknown 
parameters  in  discrete  time  dynamics  is  considered.  The  system 
identification  problem  can  be  considered  as  the  initial  phase  of  a 
stochastic  control  problem  of  a dynamic  system.  In  such  a situation 
it  is  desirable  to  estimate  the  parameters  as  rapidly  as  possible 
without  disturbing  the  normal  operation  of  the  process.  The  problem 
is  represented  in  a Baysian  framework  where  the  a posteriori 
probability  P(0jzN)  contains  the  information  supplied  by  the 
observations  about  the  parameters.  The  optimality  criterion  for  the 
synthesis  problem  is  defined  in  terms  of  the  statistical  distance 
measure  between  distributions  characterized  by  distinct  parameter 
values.  In  this  dissertation  we  define  this  quantity  in  terms  of  the 
pairwise  Bhattacharyya  distance.  The  distance  measure  has  the 
desirable  property  of  ordering  the  Bayes'  probability  of  error  as  a 
function  of  two  experimental  design  programs.  The  statistical 
distance  measures  have  been  successfully  used  in  studying  taxonomy 
and  characterization  of  racial  distributions.  It  is  shown  that  such  an 

approach  is  also  feasible  in  applications  to  more  general  dynamic 
situation. 


XIX 


Several  useful  properties  of  the  B-distance  are  derived.  Its 
relationship  to  the  Fisher  information  is  obtained,  vhere  the 
differential  metric  has  an  interpretation  in  terms  of  the  sensitivity  of 
the  observations  with  respect  to  the  parameters.  A lower  bound  on 
the  mean  square  error  is  derived  using  the  relationship  between  the 
distance  measure  and  the  mutual  information.  Application  to  dynamic 
systems  is  made  via  the  adaptive  filtering  formulation  where  the 
parameters  have  a finite  set  of  discrete  values  and  the  estimation  is 
performed  in  a parallel  processing  scheme. 


An  alternative  approach  to  the  input  synthesis  problem  is 

obtained  by  generating  these  signals  from  linear  stochastic  processes 

of  the  autoregressive  moving  average  type.  This  approach  to  the 

selection  of  random  inputs  gives  the  additional  degrees  of  freedom  of 

choosing  the  parameters  of  the  input  process,  maximizing  the  cost 

function.  The  asymptotic  property  of  the  input  correlation  matrix  is 

studied  using  the  asymptotic  eigenvalue  distribution  of  Toeplitz 

matrices.  When  the  variance  of  the  white  noise  generating  the  linear 

inputs  is  constrained,  it  is  shown  that  among  the  class  of  all  linear 

inputs  that  input  for  which  the  optimality  criterion  is  maximum  also 

1 TT 

has  the  largest  value  for  the  intergral  - — / f (X  ) d X where. 

2 Z Ti  J.  u uu 

f (X)  = S (X  )/<?,.  with  S (X)  the  spectrum  of  corresponding 
uu  uu  V uu 

stationary  linear  process. 

The  effect  of  feedback  on  the  input  sequence  has  been  studied 

in  terms  of  the  open-loop  feedback  formulation.  At  each  stage  k , 

of  the  dynamic  process,  future  inputs  are  selected  based  on  the  past 

and  present  observation  program,  and  the  first  component  of  this 

subsequence,  U , is  selected  as  the  input  u . The  feedback  is 
’ K r IN  K 

introduced  when  an  additional  observation  is  incorporated  in 
computing  the  future  inputs.  It  is  shown  here,  that  the  feedback 


CHAPTER  I 


INTRODUCTION 


A large  number  of  physical  systems  require  the  determination 
of  their  structural  parameters.  Some  of  these  require  the  estimation 
of  the  parameters  during  normal  operations  while  for  the  others  a 
separate  test  can  be  carried  out  to  accomplish  this.  A wide  range  of 
systems  belong  to  this  category  - chemical  industry  processes, 
aerospace  applications,  nuclear  reactors,  biological  systems,  etc.. 
These  systems  are  identified  by  exciting  them  with  suitable  input 
signals  and  observing  the  resulting  response  of  the  system.  The  inputs 
can  be  deliberately  chosen  in  many  instances  so  as  to  extract  maximum 
information  from  the  physical  system. 

The  type  of  response  of  a system  may  belong  to  the  following 
three  categories: 

(i)  If  the  input  U is  merely  observed  (not  controlled  in  any 
way),  what  kind  of  information  is  conveyed  by  the  observations  Z 
under  normal  operation? 

(ii)  If  the  input  U is  changed  in  some  specific  way  what 
change  is  induced  in  the  behavior  of  Z ? 

For  the  first  case  we  need  to  observe  the  data  without  interfering  with 
the  normal  operation  of  the  system.  The  second  question  can  be 


answered  by  deliberately  changing  the  input  by  a proper  experimental 
design. 

(iii)  A third  class  of  problems  is  that  of  driving  a system 
from  an  initial  state  to  a designated  target  and  the  system  requires  the 
determination  of  unknown  parameters.  This  combined  problem  of 
control  and  identification  is  called  "dual  control"  (see  Fel'dbaum  [26], 
pp.  319-33C).  Here  the  control  action  is  required  to  divide  the  total 
energy  optimally  between  estimation  and  control. 

In  this  dissertation  we  address  ourselves  to  the  solution  of  the 
question  posed  by  (ii)  above.  Even  though  no  control  aspect  is 
included  in  the  analysis,  the  system  identification  problem  can  be 
treated  as  an  initial  phase  of  a stochastic  control  problem  for  a 
dynamic  system.  In  such  a situation  it  is  desirable  that  the  parameters 
be  determined  as  rapidly  as  possible  without  disturbing  the  system  so 
as  to  cause  a deterioration  of  its  normal  operation.  This  phase  can  be 
used  to  identify  the  parameters  up  to  a desirable  accuracy  and  the 
complete  learning  will  eventually  take  place  during  the  control  phase. 
This  separation  of  the  identification  and  control  differs  from  the 
definition  of  dual  control  where  the  identification  and  control  take  place 
instantaneously.  The  solution  to  a dual  control  problem  can  be.  very 
expensive  and  since  the  control  policy  depends  heavily  on  the 
parameter  learning,  the  total  error  could  be  very  large. 
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3 

The  necessity  of  a fast  and  efficient  identification  phase  requires 
the  selection  ot  an  appropriate  input  sequence  where  the  total  input 
energy  is  constrained  and  such  that  the  optimally  criterion  uses  all  the 
available  a priori  information  about  the  system. 

The  problem  is  represented  in  a Bayesian  framework  where  the 
a posteriori  distribution  of  the  parameters,  P(_6  ( Z ; , contains  the 
information  supplied  by  the  observations  about  the  parameters,  and 
there  is  a sufficient  statistic  for  the  problem.  In  the  following  sections 
the  approach  to  the  input  synthesis  problem  is  outlined,  showing  how 
the  application  of  discriminant  functions  of  statistics  can  be  extended  to 
dynamic  systems. 

1 . 1 The  Input  Signal  Selection  Problem 

The  problem  of  experimental  design  was  considered  by  many 
statisticians  in  application  to  regression  problems.  Since  there  is  no 
controllable  objective  in  general,  application  to  input  problem  has  been 
very  few.  The  statistical  regression  model  is  described  by: 

y=_f(£)T9+v  (1.1-1) 

where  Q are  the  regression  parameters  and  _f  (£)  is  a function  of  the 
experimental  variable  £ . The  major  studies  have  been  those  of 
Kiefer  [46],  Kiefer  and  Wolfowitz  [47],  Elfving  [24],  Karlin  and 
Studden  [44]  and  Fedorov  [25].  These  are  primarily  directed  towards 
analyzing  linear  regression  problems. 


In  communication  systems  the  optimum  signals  are  those  that 
minimize  the  probability  of  error.  Generally  referred  to  as  the 
sphere  packing  problem  they  arc  solved  by  geometrical  means  by 
several  investigators  - Landau  and  Step  ian  [51],  Stutt  [74], 
Balakrishnan  [7]  and  V/eber  [82].  In  the  present  work  we  are 
concerned  with  design  of  input  signals  as  applied  to  dynamic  systems 
represented  by  a state  space  model  or  an  input-output  model. 


1.1.1  Statement  of  the  Problem 

Consider  a sequence  of  observations 

« Q 

process  which  is  parameterized  by  f R 


[z,  } of  a stochastic 
~ k 

such  that 


Zy,  = (0  . uk  , k)  + v_k  , k = 1,  2, (1.  1-2) 

Let  0 < © C Rq  where  © is  a compact  set  in  R . The  unknown 
parameters  S_  are  to  be  estimated  with  the  knowledge  of 
{ _z  , k = 1,  2,  ...  , N}  . More  specifically,  consider  that  the  signal 
y = H x where  x-  is  the  output  of  a linear  dynamic  system. 
Consider  the  following  system  representation: 

— k ~ k-l+— uk-l  + — k-1  (K!'3) 

z=Hx  +v  , k=l,  2,  ...  (1.1-4) 

k “ k — k 


&(nxn),  fl(nxl)  and  ll(sxn)  are  time  invariant  matrices. 


w,  and  v, 
— k — k 


are  v/hite  noise  sequences  with 
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Ef"\]  = Efv,]  = o , V k 


(1.  1-5) 


E[w  :1\]  = Q6-,  • Etxxf]  = 

J K J K J K 


R 6., 

jk 


E[w.v  ] = 0 , V . 

J k j , k 


Initial  state  x is  a random  variable  with 
— o 


E[x  ] - 0 , E[x  x ] = P 
~~o  — — o ~~o  o 


(1.  1-6) 


(1.  1-7) 


Further 


E[£  wj]=  E[x  vj]  = 0 , V. 


(1.  1-8) 


In  general  the  structural  parameters  are  taken  as 

= > §_  t H,  Qf  R)  . We  assume  that  the  noise  statistics  Q and 

R are  known.  Further  the  measurement  matrix  H is  also  assumed 

to  have  a known  structure.  The  unknown  parameters  are  taken  to  be 

the  dynamic  system  parameters  •!>  and  £ . Thus  J9  = (•>  , ,9  ) . With 

a prior  distribution  P(9j  let  the  system  be  in  trie  canonical  form 

such  that  the  pair  ('5  , fi)  has  a maximum  of  2n  parameters.  Let 

the  eigenvalues  of  matrix  $ be  within  the  unit  circle  ($>  is  stable). 

It  is  required  to  choose  the  inputs  {u,  , k = 0 , 1 , ...  N-l}  such  that 

k 

a prescribed  optionality  criterion  representing  the  learning 
performance  of  the  parameters  is  maximized.  The  presence  of  the 
exciting  force  {u^}  is  necessary  so  that  the  observation  program 
retains  the  maximum  information  about  the  stable  dynamic  matrix  0 . 
In  the  absence  of  this,  as  time  progresses  the  information  about  the 
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$ matrix  in  the  observations  v/ill  be  reduced  and  the  observations  will 
be  dominated  b^  .he  noise  variable,  thus  diminishing  the  parameter 
learning.  We  require  the  system  to  be  stable,  controllable  and 
observable;  these  conditions  are  discussed  in  Section  (3-1). 

We  pointed  out  that  the  a posteriori  distribution  P(9]zk)  , 
k-l»  2,  ...  , is  a sufficient  statistic  for  the  parameter  learning.  A 
direct  maximization  of  this  quantity  is  not  feasible.  Consequently,  we 
look  for  a criterion  which  has  an  ordering  with  respect  to  the 
a posteriori  probability  generated  by  Bayes1  rule. 

It  is  also  necessary  that  the  cost  function  use  all  the  a priori 
information  about  the  parameters.  An  alternate  approach  is  to  use 
adaptive  filtering  theory.  In  practical  filters  the  parameters  £ are 
restricted  to  have  a finite  number  of  known  values  .fi.  , i = 1,  2,  . . . , m, 
with  a priori  distribution  P(.9j  = P.^  , i = 1,  2,  ....  m.  Then  the 
adaptive  filter  consists  of  m filters  operating  in  parallel  such  that  the 
output  of  the  processor  is  a linear  combination  of  the  output  of  these 
m filters.  The  learning  device  calculates  the  a posteriori 
probabilities  recursively  using  Bayes'  rule 


. k Pfeklz  ‘ . £>*<£l5  > 

P(.p|z  )=  i 

i m , . 

V*  — / lr-,K“X  A \ r,  I 


(1.  1-8) 


? . PK  |zk_1  , .9)  P(.0  | 

J = 1 k ~ J~  J~ 


l “ 2 p • • • 0 l n p 1\  ~ 1 p 2 f • • • • 


In  the  framework  of  hypotheses  testing  the  m discretizations  can  be 
considered  as  constituting  m-hypothcses  with  which  one  can  associate 
the  joint  densities  p(Z  | 0)  , i = 1 , 2,  . . . , m . A suitable  criterion 
of  distinguishing  these  distributions  can  be  formulated  as  a measure  of 
distance  between  the  pairwise  distributions.  We  would  like  to  have 
for  these  distance  measures  the  following  property.  For  a given  pair 
<£a  "iY  >f  0 if  dav  <%>  % <U£>  arc  the  distance  measures 

for  two  input  designs  U and  IF  , and  if  d (U  ) > d (U«.)  then 

V C ay  n ay  C 

the  probability  of  errors  (in  the  sense  of  minimum  Bayes’  risk)  will 

have  the  ordering  P (6  , 0 , U ) < P (0  , 9 , U- ) . By  Bayes' 

minimum  risk  criterion  when  the  hypothesis  with  the  greatest  posterior 

probability  is  selected,  the  probability  of  error  P is  also 

e 

minimized  (Van,  Trees  [8l]).  In  the  following  a brief  history  of  the 
application  of  these  measures  in  statistics  and  the  extension  to  the 
present  context  are  discussed. 

1.1.2  The  Proposed  Approach 

In  1 967  Gagliardi  [29j  proposed  that  by  considering  the 
parameters  as  members  of  a finite  set,  the  criterion  for  the  optimal 
input  can  be  defined  as  that  which  maximizes  the  probability  of 
correctly  determining  the  true  parameter  value  from  a multiple 
hypotheses  test.  He  showed  that  this  probability  is  maximized  by 
those  input  sets  whose  corresponding  output  has  the  largest  perimeter 


when  plotted  in  the  output  vector  space.  However,  in  many  cases 
either  directly  maximizing  the  probability  of  identifying  the  correct 
set,  or  minimizing  the  probability  of  error  is  often  impossible.  This 
may  be  because  an  analytical  expression  for  the  error  probability  is 


too  difficult  to  find,  or  even  if  an  expression  can  be  derived,  it  may  be 
too  difficult  for  analytical  or  numerical  optimization.  Therefore,  we 
look  for  a signal  selection  criterion  that  is  weaker  than  the  error 
probability,  but  is  related  to  it  and  is  easier  to  implement. 


Let  P(P)  be  the  prior  probability  distribution  of  6 (6CR  . 
Where  © is  a compact  set  in  . Then  a suitable  criterion  can  be 


expressed  in  terms  of  the  distance  d(0^  , 0^  ) in  ©.  In  binary 


hypotheses  testing  these  are  the  distributions  of  observations 


N | N , 

p(Z  | ) and  p(Z  [0  ) . By  making  this  distance  large,  we  would 


be  able  to  distinguish  between  the  two  distributions,  making  the  error 
probability  as  small  as  possible.  In  an  adaptive  filtering  problem 
where  we  have  multiple  hypotheses  we  can  define  the  average  distance 
as  the  criterion. 


m m 


P(0  = . 0)  P{0=  . 
i=lj=i+l  1 J 


= .6)  d(.0,  .0) 
i-  j- 


(1.  1-9) 
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In  statistical  literature  the  discriminant  functions  such  as  - Pearson' 
coefficient  of  racial  likeness  (Kao  [66,  p 356j),  Fisher's  linear 
discriminant  function  (Anderson  [2,  p 134])  and  Mahalanobis 
DZ-statistic  [ 5 S]  were  employed  in  studying  race  mixtures  and 
taxonomy  (orderly  classification  of  plants  and  animals  according  to 
their  presumed  natural  relationships).  For  purposes  of  comparison 
the  above  quantities  are  defined  as  follows: 

Pearson's  Coefficient  of  Racial  Likeness  (CRL) 

If  n^.  and  n denote  the  number  of  observations  on  which  the 
means  m.^  and  rn.^  character  for  the  first  and  second 

groups  are  based,  s^  is  the  standard  deviation  of  the  ith  character 
and  p is  the  number  of  characters  used,  the  n 


, P n n m -m 
1 v'*  ll  2i  ll  i2 


CRL  = - V — - — — 
P . , n,  . + n . 


p . , n . + n 
i“l  ll  2i 


(1.  1-11) 


Fisher's  Linear  Discriminant  Function  (LDF) 

11  P^(x)  an£l  P^ ) are  two  multivariate  Gaussian  densities  such 
that  Pj(x)  = ,^)  and  P2(x)  = G(l£2  ) * then 


LDF  = 


" ^ £ 'nrHz’ 


(1. 1-12) 


This  is  a linear  function  of  the  components  of  the  observation  vector  x. 


Mahalanobis  D - Statistic 


For  two  multivariate  Gaussian  densities  Pj(x)  and  p^(x)  D* 
statistic  is  given  by 


-T-  ■ p fin  ill  II  ■U, 


DZ  = (ji1-li2)TE‘1  (lii-Hz) 


(1. 1-13) 


Only  recently  the  interest  in  application  of  distance  measures  in 
engineering  has  been  focused  towards  the  problem  of  signal  design. 
Grettenberg  [34]  used  the  divergence  measure  introduced  by  Kullback 
and  Leibler  [49],  for  signal  selection  in  communication  and  radar 
systems.  Divergence  has  been  used  by  several  others  in  application 
to  pattern  recognition  problems  - Kadota  and  Shepp  [40],  Tou  and 
Heydorn  [75]  Fu,  Min  and  Li  [28]  and  Caprihan  and  de  Figueiredo  [20]. 
Mosca  [64]  considered  the  problem  of  selecting  optimal  probing  signals 
for  identifying  tirne-vary ing  linear  M-ary  channels  by  maximizing  the 
average  divergence  under  the  energy  constraint.  In  a recent 
application  to  adaptive  filtering.  Smith  [71  ] used  the  average 
divergence  to  compute  test  signals  in  linear  multistage  processes.  The 
adaptive  filter  is  similar  to  that  of  Magill  [5  7]  and  consists  of  m 
Kalman  filters  in  parallel.  For  a given  pair  (9  ,6  )e  © . The  pair- 

0-  y 


wise  divergence  is  given  by 


\y-JK  [p(2NLV-p(zNIV]i» 


z efi 


p(zN| 


P(ZN|S 


y n 

— dZ 


(1.1-14) 


Several  properties  of  this  measure  are  given  in  Kullback  [48,  pp.  12- 


Bradt  and  Karlin  [16]  showed  that  if  J (U  ) and  J (U-) 

ay'  rj  ay'  c 

correspond  to  signals  sets  U and  U„  and  if  P (9  ,9  , U ) £ 

H h 77  C e —a  ’ -y  t) 


Pe^ay  ’ — v ' UC  * thcn  Jay(uV  3 ay  <uc»  • Wh«”  Pc  1=  the 

probability  of  error.  The  result  holds  for  all  prior  probabilities 

(Ptt  > Py  ) • Grettcnberg's  studies  showed  that  the  use  of  divergence 

could  result  in  very  poor  signal  sets  compared  to  direct  minimization 

of  P£  . We  define  another  distance  measure  which  has  the  above 

ordering  property  with  respect  to  the  probability  of  error. 


The  ordering  property  of  B as  well  as  that  of  the  divergence 
follows  from  a theorem  due  to  Hardy,  Littlewood  and  Polya  [3  5],  Also 
see  Blackwell  [ 1 4 j . 


The  divergence  and  B-distance  are  not  equal  in  general.  For 
dynamic  systems  where  they  are  unequal  the  results  obtained  by 
applying  these  discriminant  functions  will  differ.  There  is  no  general 
result  relating  the  performance  of  these  two  measures.  It  was  shown 
by  Grettenberg  [34]  and  Kailath  [4l]  that  for  the  problem  of  selecting 
communication  links,  the  result  obtained  from  the  B-distance  was 


better  than  that  obtained  by  the  divergence,  coinciding  with  the  result 
obtained  by  the  direct  minimization  of  the  error  probability.  This  is 


12 


also  the  case  in  input  selection  problems  in  linear  dynamic  systems 
with  nonzero  dynamic  noise,  as  illustrated  in  Chapter  3.  (Sec  example 


3.  5.  1). 


The  first  part  of  the  dissertation  is  primarily  concerned  with 
applying  the  distance  measures  to  synthesize  input  signals  in  discrete- 
time  linear  stochastic  systems.  The  learning  performance  with  the 
inputs  obtained  from  the  B-distance,  divergence  and  nonoptional  inputs 
are  compared.  Numerical  results  show  that  the  optimal  inputs  can  be 
adequately  represented  by  low-order  difference  equations. 

The  synthesis  of  random  inputs  is  considered  as  an  alternate 
approach  to  the  signal  selection  problem.  These  inputs  are  generated 
by  linear  stochastic  processes  of  the  autoregressive  moving  average 
type,  and  the  method  gives  rise  to  a reduced  optimization  in  the  space 
of  input  process  parameters.  The  characterization  of  optimal 
stochastic  inputs  is  studied  in  terms  of  their  spectral  densities. 

1.2  Review  of  Literature 

In  this  section  a brief  history  of  the  development  of  input 
selection  problem  is  discussed.  In  I960  Levin  [54]  considered  the 
estimation  of  the  impulse  response  function  of  a discrete  time  model 
given  by: 


z.  = 22  a.  g.  (h)  + V'  . k = 0,  1 , . . . , N 


(1.2-1) 


The  least  squares  estimate  of  a = (a  , a,  . . . , a ) ;s  riVCn  bv 
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where  the  matrix  G is  given  by 
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For  white  noise  w with  variance  CT  the  error  covariance  matrix 
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E[(-"— LS*  (-  ' -LS,T]  = ’wt007]  <*-2-4> 


It  was  shown  by  Levin  that  the  trace  of  the  covariance  matrix  is 
minimized  by  using  a white  noise  sequence  as  inputs  (i.  e.  the 
quantities  g.(k)). 

Levadi  [ 53]  used  a continuous  time  model  of  the  form 


= J w(f,  r,  B)  u(r)  dr  + n(t)  , tc[o,T] 


(1.2-5) 


,S  arc  the  parameters  of  the  weight  in  function.  He  showed  that  the 


optimal,  energy-constra  inccl  inputs  arc  obtained  by  solving  a 
Fredholm  integral  equation  of  the  second  kind.  Arirnoto  and  K i mi 


imura 


[5]  considered  the  system  given  by  (I.  1-9)  and  derived  results  similar 
to  those  of  Levin  by  ihe  maximization  of  the  approximate  mutual 
information  when  the  inputs  are  amplitude  constrained. 


Aoki  and  Staley  [3]  consider  a scalar  difference  equation  of  th< 


form 


= Z a xk-i+ul 


(1.2-6) 


zk  “ xk  + vk  ’ k l’  Z’ 


(1.2-7) 


and  maximize  the  trace  of  the  Fisher  information  matrix  to  desi 


esign 


energy  constrained  inputs.  The  Fisher  information  is  defined  by  a 
q X q matrix  with  elements 


MU  = [E  ^ 1”p(2N|2)4-  ln  p(2NI“)1 

1 j J 

The  approximation  to  the  trace  is  made  in  the  sense  that 


trace  M (trace  M)  * as  N *♦  03 


(1.2-8) 


(1.2-9) 


Nahi  and  Wallis  [65]  give  an  optimal  control  formulation  of  the 
synthesis  problem  and  use  the  Cram6r-Rao  lower  bound  to  det 


ermine 


optimal  energy  constrained  inputs.  The  Cramer-Rao  lower  bound  on 
the  estimation  error  is  given  by 


IB 


E [(£  -f)  («  - 0)T]  ^ M"1  (1.2-10) 

The  nonlinear  deterministic  plant  of  the  following  form  has  been  used. 

x = f(x,  (t),u(t),  t,  a),  z(o)  - x , te[o,  T]  (1.2-11) 

o 

where  atR* 

z(t)  = g[x(t),u{t),t,a]  + v(t),te[0,  T]  (1.2-12) 

Dynamics  similar  to  the  above  are  also  used  by  Kalaba  and  Spingarn 
[42]  who  maximized  the  sensitivity  of  the  dynamic  function  with  respect 
to  the  parameter  subject  to  an  energy  constraint.  The  solution  to  a 
two  point  boundary  value  problem  is  obtained  via  quas ilinerazation. 
Goodwin,  Murdoch  and  Payne  [ 3 0 J employed  the  trace  of  the  inverse  of 
Fisher  information  matrix  to  obtain  input  sequences  for  a single-input 
single-output  system  having  amplitude  constraints.  A vector  form  of 
this  case  was  the  topic  of  a dissertation  by  Lopez-Toledo  [56]  who 
considered  feedback  laws  of  the  form 
l 

uk=E  Bi(k)yk  + hk  (1.2-13) 

i=  1 

where  g.(k)  and  h are  the  quantities  to  be  determined  by 

l K 

optimization  of  the  sensitivity  function. 

In  all  of  the  above  mentioned  works  [3,30,  42,  56,  65]  the  authors 
used  basically  the  same  criterion  for  the  synthesis  problem  and 
(except  for  [30])did  not  compare  the  performance  of  these  signals 
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with  nonoptimal  inputs.  Absent  also  is  a discussion  of  the  validity 
arid  approximations  involved  in  the  above  procedures.  It  is  very 
important  to  realize  that  the  cost  criterion  is  a function  of  the  unknown 
parameters  to  be  identified,  and  holds  only  for  large  N . 

Recently  Mehra  [62,  63]  used  some  concepts  from  experimental 

designs  in  linear  repression  models  (see,  e.  g.  Fedorov  [2  5 ) to  design 

optimal  inputs  in  linear  systems.  Approximation  to  the  regression 

form  is  carried  out  by  a Taylor  series  expansion  of  the  system 

transfer  function  about  an  a priori  mean  9 . Applying  the  results  of 

— o 

repression  design  techniques  Mehra  showed  that  under  steady-state 
conditions,  an  input  consisting  of  a finite  number  of  frequencies  can  be 
found  such  that  it  has  the  same  information  matrix  as  any  other 
stationary  input  of  equal  power.  The  optimal  inputs  are  obtained  by- 
repeating  the  experiment  through  the  same  length  of  time  for  each  run, 
until  the  estimates  of  consecutive  runs  converge.  Mehra's  approach, 
though  by  far  the  most  novel  method  for  the  solution  of  this  class  of 
problems,  requires  a good  justification  of  the  approximation  involved 
in  the  Taylor  series  expansion.  The  approximation  includes  only 
terms  up  to  the  first  order.  It  is  quite  likely  that  the  successive 
estimates  of  the  parameters  are  such  that  the  difference 

A 

=0,-9  , i = 1 , Z, , may  he  very  large  and  the 

i i i 

approximation  may  be  very  poor.  This  could  easily  give  rise  to  a 
divergence  in  both  the  input  sequence  and  the  resulting  estimation. 


and  a solution  may  not  exist.  In  off-line  applications  this  may  not 
prove  to  be  a hindrance,  but  in  systems  where  the  identification  has 
to  be  carried  out  during  the  process  itself,  more  analysis  is  required 
in  implementing  this  method.  In  spite  of  these  drawbacks,  the  method 
of  [62,63]  constitutes  a distinct  departure  from  the  previous 
approaches  and  has  broadened  the  scope  of  input  selection  for 
applications  in  physical  systems. 

Some  authors  have  also  experimented  with  signals  of  special 
character.  Cumming  [22]  applied  a synchronous  random  telegraph 
wave  and  varied  its  bandwidth  to  match  the  frequency  response  of  the 
system.  Since  the  frequency  response  of  the  system  is  assumed  to  be 
known,  these  signals  performed  better  than  the  pseudo  - random  binary 
perturbations  discussed  by  Briggs  et  al,  [18].  Box  and  Jenkins  ([15], 
pp.  416-420)  used  a first  order  autoregressive  input  in  application  to 
a simple  system.  Litman  and  Huggins  [55]  used  growing  exponentials 
in  identifying  transfer  function  models. 

1. 3 Outline  of  The  Dissertation 

The  deterministic  signals  derived  from  the  maximization  of 
B-distanee  and  divergence  measures  is  of  the  open-loop  type.  Only 
a priori  information  about  the  system  is  used  in  synthesizing  the  input 
and  no  observation  information  is  associated.  The  solution  to  the 
constrained  problem  will  be  simplified  if  the  cost  function  has  a 
numerically  tractable  form  so  that  the  solution  is  globally  optimum. 


It  was  shown  in  [80]  that  the  distance  measures  for  linear 
systems  is  indeed  quadratic  in  the  inputs  and  the  constrained 
maximization  has  a global  optimum. 

The  effect  of  feedback  is  studied  in  terms  of  the  open-loop 
feedback  synthesis.  Here  at  each  stage  k the  inputs  for  the  future 
steps  are  computed  based  on  the  past  observations  and  the  first 
element  of  this  subsequence  is  taken  as  u^  . The  feedback  is 
introduced  when  a subsequent  measurement  is  taken  and  the  procedure 
is  repeated. 

Synthesis  of  random  inputs  is  considered  as  an  alternative 
approach  to  the  signal  selection  problem.  The  signals  are  generated 
using  linear  stochastic  processes.  The  optimization  problem  consists 
of  choosing  the  process  parameters  by  the  maximization  of  sensitivity 
of  the  observations  with  respect  to  the  parameters. 

In  Chapters  2 and  3 we  present  the  discussion  of  distance 
measures  and  synthesis  of  open-loop  inputs.  Design  and 
characterization  of  stochastic  signals  is  given  in  Chapter  4.  Chapter 
5 presents  a study  of  open-loop  feedback  signals. 

In  Chapter  2 the  B-distance  is  defined  in  terms  of  the  likelihood 
ratio.  Some  properties  of  this  distance  measure  are  given  in  Section 
2.2.  Sections  2.3  and  2.4  establish  the  13 -distance  as  an  optimality 
criterion  for  signal  selection.  The  ordering  property  with  respect  to 


19 


the  error  probability  is  derived.  Upper  bounds  on  the  mutual 
information  in  terms  of  the  divergence  and  E - distance  are  derived  in 
Section  2.5  and  a lower  bound  on  the  mean-square  error  is  obtained 
using  the  rate-distortion  theory.  The  geometric  interpretation  of 
^tvy  is  used  to  re^atc  the  distance  measure  to  the  Fisher  information 
through  the  differential  metric. 

Synthesis  of  open-loop  inputs  in  linear  disc rete-time  systems  is 
considered  in  Chapter  3.  A description  of  the  identifiability  of 
parameters  is  summarized  in  Section  3.  1.  The  R-distance  is  derived 
for  the  state-space  model.  This  is  shown  to  be  the  sum  of  three 
terms  - a quadrative  term  in  UN  * , a linear  term  in  UN  1 and  a 
constant  term.  For  the  energy  constrained  case  the  inputs  are  shown 
to  be  the  eigenvector  of  a symmetric  positive  definite  matrix 
corresponding  to  its  largest  eigenvalue.  A sufficient  condition  for  the 
accuracy  of  learning  is  derived  as  the  nonnegativity  of  a quadratic 
form  of  the  inputs.  Numerical  results  show  that  the  optimal  inputs 
derived  above  can  be  approximated  by  a low  order  difference  equation. 
Two  distinct  numerical  examples  illustrating  the  open-loop  synthesis 
are  detailed  in  Section  3.  5. 

Chapter  4 is  concerned  with  the  synthesis  of  stochastic  inputs 
generated  from  linear  models  of  the  mixed  autoregressive  moving 
average  type.  The  use  of  these  inputs  provide  us  with  additional 
degrees  of  freedom  (in  selecting  the  optimal  sequence)  in  terms  of  the 


parameters  of  the  process.  Thus  these  inputs  can  be  considered  as 
being  generated  from  a white  noise  sequence,  using  a linear  filter. 
Section  4.  1 gives  a brief  discussion  of  various  linear  models. 


In  Section  4.2  properties  of  these  signals  are  studied  in  terms 
of  the  spectral  density  of  the  inputs.  The  system  is  considered  in  the 


form  of  a single-input  single-output  dynamics  and  the  sensitivity  of  the 


output  with  respect  to  the  parameters  is  used  as  the  optimality 
criterion.  By  approximating  the.  input  correlation  matrix  by 
appropriate  Toeplitz  forms  and  using  the  results  on  asymptotic  eigen- 
value distribution  of  Toeplitz  matrices,  the  optimum  value  of  the  cost 

.2 

function  is  derived.  When  the  variance  cr>  of  the  white  noise  is  fixed, 
at  the  optimum  cost  the  integral 


1 

„ 2 
2 v o 


(X  ) dX 


V -77 

also  attains  its  maximum  value.  Examples  are  presented  in  Section 
4.3  studying  the  characteristics  of  these  inputs. 


Chapter  5 deals  with  the  synthesis  of  feedback  inputs  which  are 
obtained  using  the  open-loop  feedback  technique.  At  each  stage  k of 
the  process,  future  inputs  are  selected  based  on  the  past  and  present 
observation  program,  and  the  first  component  of  this  subsequen 

is  selected  as  the  input  u ^ . The  feedback  is  introduced  when 
an  additional  observation  is  incorporated  in  computing  the  future 
inputs.  The  feedback  input  is  shown  to  be  equal  to  the  sum  of  open- 


Some  concluding  remarks  a 


this  field  are  given  in  Chapter  6 


CHAPTER  II 

TIIE  DISTANCE  MEASURE  AND  OPTIMALITY  CRITERION  FOR 
INPUT  SIGNAL  SYNTHESIS 

In  this  chapter  the  optimality  criterion  for  input  signal  selection 
is  obtained  in  terms  of  the  statistical  distance  measure  between 
distributions  characterizing  distinct  pairs  of  parameter  sets.  In 
particular  the  distance  function  is  defined  in  terms  of  the  Bhattacha  r y ya 
measure  (B-distance)  and  its  properties  are  studied.  In  Section  2.  1 
the  B-distance  is  defined  and  some  of  the  related  quantities  are  stated. 
Section  2.2  presents  a discussion  of  the  properties  of  this  measure. 
Some  of  these  results  are  useful  in  studying  the  asymptotic  behavior  of 
the  B-distance.  The  important  ordering  property  with  respect  to  the 
probability  of  error  is  presented  in  Section  2.  3. 

The  discussions  of  Section  2.4  show  that  the  statistical  distance 
measures  are  closely  related  to  Shannon's  definition  of  mutual 
information  [69]  and  thus  provide  an  alternative  to  using  the  latter. 

The  distance  measures  are  much  easier  to  compute  than  the  mutual 
information.  In  Section  2.  5 the  geometric  interpretation  of  the  B- 
distance  is  employed  in  relating  this  to  the  Fisher  information. 

2.  1 The  Dhattacharyya  Distance: 


;-r 


stochastic  process  parameterized  by  a p-dimensional  parameter 


? ( RP  where  G Ls  a compact  sub  set  in  RP  . Ret  u 


is  assume 


that  the  joint  probability  density  of  Z exists  and  is  given  by 
N i 

p(Z  [0)  . For  a given  pair  [9  , 6 ) in  © define  the  likelihood  rati 

^ / 


p(ZNl£a ) 

p(ZNliy) 


(2. 1-1) 


Define  the  quantity  0 (t)  as  (Chernoff  [ 2l],  Zacks  [84,  p 212]) 


PccY  (°  etlnL  p(-N'-y)  d-N  ' tc(0' 1}  (2-1-2) 


N 1 

where  fl  is  the  sample  space  of  Z . The  quantity  P^y  for  t=~ 

is  defined  at  the  Blattacharyya  coefficient  [l3,4l]  , 


puy{\)  = / [p(zN|ea>  p(zn | e_y> 1 2 dzN 


(2.  1-3) 


The  above  integral  also  appears  in  a non-statistical  context  in  a paper 
by  Hellingcr  [33].  Ideally  we  like  to  define 


p (t  ) = in  f p (t) 
te(O.l) 


(2. 1-4) 


Such  a quantity  is  analytically  hard  to  work  with  and  it  is  clear  that 


any  advantage  gained  by  (2.  1-4)  is  overshadowed  by  the  numerical 
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vj 


complexity  involved  in  implementing  p (t ' ) . Moreover,  it  i« 

Of  f 


showii  in  App.  A that  t = for  Gaussian  densities  with  equal 
covariances.  The  Bhattacharyya  distance  is  defined  as 


B = -In  p 

ay  1 ay 


(2.  1-5) 


This  quantity  has  also  been  used  by  Feller  [27]  to  study  the  limit 
theorems  of  some  of  random  variables.  Let 


F = P(x  * p] 
x (p) 


(2.  1-6) 


and 


p(t)  = 


= / e^dF(£) 


(2.  1-7) 


Denoting  the  new  random  variable  by  x the  distribution  of  x is  given 

by 


r 

■/ 


(2.  1-8) 


P (t) 


It  can  be  shown  that  F_  ^ j is  a distribution  function  and  the  random 


variable  x has  mean  and  variance  given  by 


w * pT)  / 


E[x]  = ;^v  / dF(C)  - ~ lnp(t) 


(2.  1-9) 


and 


.2 


Var(x)  = — - [in p(t )] 
dt 


(2.  1-10) 


J 


I . 1 1 " iii  pi ' 
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Some  discussion  with  regard  to  performance  hounds  is  also  given  in 
Van  Tres  [81,  pp  116-133]. 

A distance  function  closely  related  to  the  13 -distance  has  been 
studied  by  Matusita  [59]  and  is  a particular  case  of  the  measure 
defined  by  Jeffreys  [39,  p 174]. 

I i 

May  <m)  - J Up(zNlia)}m  [p(zN|ey)}m|  dzN  (2.1-11) 
a 

Now  the  Matinta  measure  is  defined  as 


l 

l 

I 2 

[M  Jz=  D 

ay  2 ay 

[/{ 

’pCzNlitt) 

2 

p(ZN|9y)  2}  dZN 

Simplifying  the  above  we  have 

Appendix  B gives  the  Bhattacharyya  distances  for  multivariate 
Gaussian,  Gamma,  Poisson  and  multinominal  distributions. 

2.  2 Properties  of  Bhattacharyya  Measure 

In  this  section  we  give  several  properties  of  the  B-distance. 
Proposition  2.  2.  1 If  x and  y are  two  independent  random  variables 


then 
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B . (x,  v)  = B (x)  4 B (v) 

ay-  ay K ayK’ 


Proof: 


puy{x,Y)=f f{ p(x*y|,V p(*-ylv}  2dxdy 

x y 

~Jf { p(xl?a)  p(yl8a)  p(x|ey)  p(y!ey)|  ^ cixdy 


- 2 2 
=y  {p(xl0a)  P(x!0y  )}  dx/{p(y[6a,p(y|ey,}  dy 


or  Pay  (x'y)  = Pay(x)  Pay (y) 


now  Bay(x,y)=  -InO  (X|y) 


= -In  p (x)  -In  ,0  (y) 

ay  «y 


= B (x)  + B (y) 

ay  ay  vy/ 


Then  in  general  if  the  random  variables  x,,  x_,  . . . , x are 

12  n 


independent  then 
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nco-  (K1  ■ x2 *„>  '£  nr/-/<xil 


(2.  2-2) 


Proposition  2.2-2 


properties 


(i)  0 < B < “ 

ay 


B has  the  following  metric  related 


(2. 2-3a) 


(ii)  B is  symmetric,  B - B 

oty  ay  ya 


(2.  2-3b) 


(2.  2-3c) 


(III)  B =0 

a.  a 


B^y  does  not  satisfy  the  triangular  inequality  for  a metric.  Let 
us  refer  to  B^y  as  the  pseudometric.  The  transformation  obtained 
via  the  Matusita  measure,  D , indeed  satisfies  the  trinacle 

ay  fe 

inequality. 


Proposition  2.  2.  3 


D +D,^D- 

ay  y 6 a6 


Proof:  We  use  Schwartz  inequality  to  derive  this 


Da6  = j/f{.><zNl4)}2-{p(,N!i6} 


2 dZN 


(2.  2-4) 


• jf  ({p(zNK>}2  -{pt2Niay)}2)-({p(sNH6)} 


1 2 


«.zN(2 


From  Couchy  - Schwartz  inequality 


The  second  term  in  square  brackets  in  (2.2-5)  is  nonnegative. 
Hence  it  follows  from  (2.2-5)  that 


D < D + D , 

ao  ay  yo 


13y  defining 


p(t)  = In J p^i^)1  PlZ^l^y)1'1  dZN  , te[0,  1]  (2.2-6) 


We  will  show  that  p(t)  is  a convex  function  of  t . From  this  it  is 


possible  to  study  p(t)  as  a function  of  t . 
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Then  clearly 


/t  1-t  Po  N <f  t l-l  N / t 
p p In d /.  / P P cl / / p 

o'  y p - 1 J 1 y - J 


t 1-t  .__N  / t 1-t  , ,_N 


p p dZ*  / p p ln( — — ) dZ 

1 y - «/  Fa  Fy  p - 


(2. 2-7) 


From  (2.2-6)  and  (2.2-7)  we  have 


-■  P > 0=»p(t)  is  a convex  function  of  tf[0,  l] 
dt 


Another  important  property  of  the  distance  measure  is  its  monotonic 

behavior  v/ith  an  increasing  number  of  observations.  If  B,  is  the 

k 

CD 

k I > 

pair  wire  distance  for  observations  Z , then  the  sequence  j B ' 

1 k|k=l 

is  nondecreasing.  Hence  we  have  the  following  result. 


Proposition  2,2-5  If  Jz  J is  a sequence  of  observations  of  a 

1 k=l 

stochastic  process  then 


h + i "h 


0 


Proof: 


°c,y , k + 1 = f\  p(-k  ' ' li«  > k + ’ 1 9-  >!  2 dz 


k+  1 


-y  | 


f ''"I  [p(2k|£a)p(-k|V] 


[i>(\  H i p<*k i V] 2 + , 


This  result  is  useful  in  studying  the  asymptotic  relationship  between 
the  distance  function  and  the  probability  of  error. 
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2.  3 Ordering  Property  of  the  D -distance 

We  would  like  for  a distance  measure  d , which  is  defined 

o.y 

as  the  statistical  measure  of  the  affinity  of  probability  measures  of 
the  observations  characterized  by  9 and  6 , to  have  the  following 

— a / 

property':  For  two  experimental  designs  and  if  , 

then  P (0  0 ;U  ) < P (ft  6 ; Ur  ) where  P is  the  Bayes1 

probribility  of  error  for  specified  prior  probabilities.  In  such  a 
situation  we  say  that  the  experiment  U is  more  informative  than 
(symbolically  ) . The  input  signal  U appears  in  the 

distance  function  via  the  system  dynamics  and  measurement  process. 


The  ordering  of  the  B-distance  with  respect  to  the  probability  of  error 
can  be  proved  using  a theorem  of  Hardy,  Bittlewood  and  Polya  [35]. 


Results  similar  to  that  of  these  authors  has  also  been  obtained  by 
Blackwell  [14].  W e will  use  a lemma  due  to  Bradt  and  Karlin  [ 1 6]  to 


express  the  probability  of  error  in  an  integral  form. 


2.  3.  1 Probability  of  Error  for  the  Multi-Hypotheses  Case 

Define  the  total  probability'  of  error  as  the  probability  of 

error  incurred  in  identifying  the  correct  values  of  the  system 

parameters.  Let  P(9  - i 0)=  P.t  i=l,2 m be  the  set  of  prior 

probabilities.  Bet  the  cost  of  each  course  of  action  be  C..  . The  first 

subscript  signifies  that  the  ith  hypothesis  is  chosen  and  the  second 


signifies  that  the  jth  hypothesis  is  true.  Denoting  the  region  of  the 
observation  space  fi  in  which  we  choose  as  , the  total  risk  is 
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g iven  by 


in  m 


R = 


2E  ‘■jCy  j 
1=1  J*1  C 


/-p(.N|j0)dzN 


(2.3-1) 


If  C..  ~ ® ^ i + j then  the  resulting  expression  represents  the  total 
probability  of  error.  Thus  the  probability  of  error  P is  given  by 


P 

e 


m m 

EE 


i=l  j = i 


p.  J p(zN|.9  ) dzN 
[2. 


(2.3-2) 


Bayes'  likelihood  ratio  decision  rule  applied  to  the  sample  Z^ 
requires  that  we  rank  the  parameter  sets  P , i=l,2 m by  the 

l— 

following  rule.  For  all  i*j  the  set  .P  is  favored  to  the  set  P if 

l“  j~ 


p(ZN|.P)  P. 

> A = — . 1 

p(ZN|.9)  ‘J  Pi 


(2. 3-3) 


The  maximal  element  of  this  ordering  is  the  class  to  which 
assigned.  Let 


S.. 

u 


p(zn|.9) 
— 1~ 

p(zn|jP) 


is 


(2.3-4) 


Then  the  probability  of  error  incurred  when  the  decision  rule  is  Bayes' 
. N 

and  Z belongs  to  hypotheses  k is 


pc<i.j)^/j1'iWZ-Nlie){  jj..  p(zK|.fl)j’  ‘d/N 


or 


p,  * "z  z p*  p ;-*/  P<p.N i ^ »i  p(p.N )■  -*  dz 


N 


(2.3-11) 


i=  1 k~i  + 1 


The  last  integral  is  the  moment  generating  function  p(t) 


m- 1 m 


P s 
e 


E E P1  pj  1 Vl) 

i-1  j = i -f  1 


(2. 3-12) 


The  tightest  bound  is  obtained  for  which  the  right  hand  side  of  (2.3-10) 
is  minimized  with  respect  to  t . This  gives 


m-1  in 

P s inf  J'j  ^ 

tc(0,  1)  i=l  j = i + 1 


Pt  P1_t  p..( t) 
i j ij 


(2. 3-13) 


For  the  class  of  upper  bounds  for  which  t = — we  have  [50] 


m-  1 m 


SZE  pipji  Pij'l' 


(2. 3-14) 


i=l  j = L4  1 


The  upper  bounds  obtained  here  do  not  assume  any  form  of 


distribution  for  the  7. 
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2.3.2  rI  he  Ordering  Property  of  the  Distance  Measure 
For  two  experimental  designs  and  U„  , if 

*W<V 1 V<u<?  impUcs  pe<W  Vs  Pe(W  V «*» 

such  a property  of  the  distance  measure  is  referred  to  as  the  ordering 
property.  A necessary  condition  for  the  probability  ordering  can  be 
derived  using  a theorem  of  Hardy,  Littlewood  and  Polya  [35].  Result 
similar  to  that  of  the  above  authors  has  been  obtained  by  Blackwell 


[14]. 


The  following  lemma  of  Bradt  and  Karlin  [16]  will  be 


used  in  deriving  the  ordering  property  of  the  B-distance. 
Lemma  2.3-1  A necessary  and  sufficient  condition  that 

(9  ,I.UJS  P (9  8 U..)  is  that 

e —a  — y i r?  e — o:  — y i c 


p(ZN;  U \s  ) X 

N n - « , o]p(Z  ; uje  )dz' 

P(2  i % liy)  ( n 7 


/(  p(ZN.  uJfi  ) i 

minJ — ft — ' 0 p(-^N;  Uc^ 

(p(zN.u^!oy)  ) c 


y)  d Z 


(2. 3-15) 


where  £ = P ' /P 

y a 


This  result  expresses  the  error  probability  as  a function  of  the 


likelihood  ratio 


L=  p(ZN]0a)/P(ZN|fl. 


(2.3-16) 


The  following  theorem  of  Hardy,  Littlewood  and  Polya  [3 S J can  be 


specialized  to  obtain  Blackwell's  theorem  as  follows. 
Theorem  2.3-1  Suppose  p(x)  and  q(x)  are  given  such  that 


J p(x)  dx  = f q(x)  dx 

R R 


(2. 3-17) 


Then  a necessary  and  sufficient  condition  that 


p(x)  dx 


R 


(2. 3-18) 


for  every  d>("  ) , convex  and  and  continuous  in  a closed  interval 
including  all  values  of  f and  g , is  that 


R 

for  all  y . 


J max  | g(x)-y  , o|q(x)dx  s.^max  |f(xl  - y , Oj.  p(x)dx  (2.3-19) 
R R 


K 

Using  the  lemma  (2.3-1)  and  theorem  (2.3-1)  we  have  the  following 
theorem  relating  the  probability  of  error  to  the  ordering  of  (2.3-18). 


Theorem  2.3-2 


If  U and 

V 


are  two  experimental  designs. 


P (9 

c —U 


Ti 


uc ) 


P (9 

C 


U-  ) 


(2. 3-20) 


for  all  prior  probabilities  (l1  , P ) , if  and  only  if 
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E[+(S)|-J  5 E[*(I-{!,-J 


for  all  continuous  concave  functions  4>(L) 


The  necessary  condition  for  the  ordering  of  error  probability  in  term: 
of  the  B-distance  is  given  by  the  following  theorem. 


Theorem  2.  3-3  If  P (9  , 9 ; U ) P (9  ,0  ; U„)  for  all  prior 

e -a  — y 77  e -a  -y  C 

probabilities  (P  , P ) then 

ay 


Proof:  By  writing  0 as 

y 


|e  ) 

—7---  • P^N|9y)dZN 

P(Z  \Hy ) y 


* E[ 


We  notice  that  (2.3-21)  is  of  the  form  4*  (L)  - ^ L , a continuous 
concave  function  of  L . Thus  it  is  clear  that  whenever 


rc(-a'-y  : V S Pe  (— a ’ -y  : V we  havt 


p (U  ) S o (U„) 
cry  rj  ay'  C' 


1 i 


40 


By  definition  B^  y = -In  ,0^  and  (2.  3 -22)  follows. 


rheorem  (2.3-3)  gives  a necessary  condition  for  the  optimality  of  erroi 
probability.  Sufficiency  of  the  ordering  property  is  true  only  when  the 
conditions  Oi  theorem  (2.  3-2)  are  satisfied.  The  performance  of 
B-distance  and  divergence  will  be  studied  for  the  case  when  the  prior 
probabilities  are  all  equal. 


For  the  special  case  of  independently  and  identically 

distributed  observations  it  was  shown  by  Chenoff  [21]  that  the 

B-distance  is  exponentially  optimum,  that  is,  p essentially 

ro;  y 3 

represents  the  probability  of  error  - 


Or  y 

e for  large  N 


(2. 3-25) 


Generalization  of  this  result  to  nonidentical  distributions  can  be 
obtained  using  a limit  theorem  due  to  Feller  [27], 

Since  the  Bayes'  probability  of  error  is  minimized  by 
choosing  the  parameter  set  with  the  largest  a posteriori  probability 
P(9JZ  ) , whenever  the  condition  of  theorem  (2.3-3)  is  satisfied,  the 
optimal  inputs  obtained  by  maximizing  the  B-distance  will  also  result 
in  the  highest  a posteriori  probability,  containing  maximum  information 
about  the  parameters. 


The  generalization  of  the  above  theorem  to  the  case  when 


when  the  parameter  space  © contains  more  than  two  points  can  be 
obtained  using  the  average  distanc  e.  Each  pairwise  error  probability 
can  be  used  to  uniformly  approximate  the  corresponding  Bhattacharyya 
coefficient  t y , through  the  minimum  function.  Thus  with  each 

probability  pair  we  can  associate  the  corresponding  pairwise  distance. 
Since  the  total  probability  is  the  sum  of  all  such  measures,  the  distance 
function  is  also  the  sum  of  all  possible  pairs  of  distances.  The 
following  theorem  generalizes  this  result. 

Theorem  2.  3-4  When  the  parameter  space  © contains  more  than  two 

points,  the  total  probability  of  error  P is  given  by  (2.3-2)  and 

whenever  P (U  ) ^ P (U~)  then  B(U  ) s B(U~)  . 

e V e C V C O 


The  average  distance  B is  given  by 


IS 


dP(6  ) dP(«  ) 

o:  y —a  —y 


(2. 3-26) 


© x © 


When  the  parameters  take  on  a finite  number  of  discrete  values 


m m 


-EE  p,  Pj 

1=1  j=l  J 


(2. 3-27) 


2.  4 Mutual  Information  and  Distance  Measures 

It  is  an  interesting  fact  to  note  that  the  Bhattacharyya  distance  and 
the  divergence  (defined  in  Chapter  1)  arc  closely  related  to  Shannon's 
mutual  information  [68,  69].  The  average  mutual  information  I(Z^;  3) 
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r\ 


i 


i 


L/ 


represents  the  mean  amount  of  information  that  knowledge  of  the  value 
assumed  by  Z supplies  about  the  value  of  the  parameter  6 . In 
communication  systems  these  two  quantities  can  be  treated  as  input 
X and  output  Y of  a given  channel  and  the  capacity  of  the  channel  is 
expressed  as 

C = max  I(X; Y ) (2.4-1) 

whei’e  the  maximum  is  taken  over  all  the  possible  choices  of  the  input 
distributions. 

The  distance  function,  though  not  detined  in  a similar  manner, 
analogously  represents  the  affinity  of  two  distributions  cha racter izing 
the  parameter  sets.  Thus  larger  the  distance  measure,  greater  is  the 
knowledge  of  correct  parameter  sets  provided  by  the  observations. 
Hence  it  is  not  surprising  that  there  exists  a certain  relationship 
between  these  two  information  measures  and  it  is  shown  in  this  section 
that  the  measures  - divergence  and  B-distance  bound  the  mutual 
information.  This  also  leads  to  a lower  bound  on  the  mean- square 
error  through  the  rate  distortion  theory. 

2.4.1  Upper  Bound  on  the  Mutual  Information 
Definition:  The  average  mutual  information  I(ZN;£)  is  given  by 


i(ZN;0)  = f J p(ZN,P)  In 


0 


p(Z'N,  OJI 

p(ZN)  p(£) 


d/N  d9 


(2.4-?.) 


This  can  also  bo  written  as 
.N 


I(ZN;  0)  = H(9)  - H(A  1/N) 
where  H(9)  is  the  entropy  of  0 

H(0)  = -J~ p(£)  In  p(0^)  df) 

0 

and  II(0(ZN)  is  the  conditional  entropy 

H(9|ZN)=  -J  J'  p(7N,0)  In  P(S|ZN)  dZN  d0 


(2.4-3) 


(2.4-4) 


(2.4-5) 


zN  e 


"In  words  (2.5-3)  expresses  the  intuitively  plausible  fact  that  the 
average  information  supplied  about  _6  by  specification  of  lz  ! equals 

t~k  ( l 

the  average  a priori  uncertainty  in  0_  minus  the  average  uncertainty  in 
0 that  still  remains  after  \z,  I ^ is  specified.  The  following  theorem 

rkji  1 

provides  the  upper  bound  on  the  mutual  information  in  terms  of  the 
distance  function. 


Theorem  2.4  -1  The  upper  bound  on  the  mutual  information  is  given 

by 

TVT  1 — 

(2.  4-6) 


I(ZN;  0J*  ~ J 


For  the  Gaussian  case  with  equal  covariance  functions  J = 8B  and 


Mere 


I(ZN;  0)  S 413 

(2.4-7) 

r i 

J = E J 

ccyV  ayJ 

(2.4-8) 

and 


■ 


J«y  -/  [ P(2Nl5ff)  - P(ZN|S,.)]l 


P(Z'  |£j 


Proof:  Using  the  definition  (2.4  -2)  we  can  write 


I(ZN;  6)  = -f  p(ZN)  In  p(ZN)  dZN 


*y*  I* p(zn.  § 


S ) In  p(Z/  | 0)  dZ^  df) 


zN  e 


= J1  + J2 


Consider  the  first  integral  I . 


/p(ZN)  In  p(ZN)  dZN  = /*j  C p(ZN|fl  ) p(3  ) dfl 


In  ( /’p(ZN|fl)  p(0)  d0  I dZ1 

\ J I “ 


Using  the  inequality  [36,  p 150,  theorem  204] 
ln  A P(ZN  I P ) p(6  ) dO  / In  [p(ZN  | 9 )j  p(0  ) dO 


(2.4-1  1) 


we  obtain 


j J [1n  pe-  l5)Jp(z-  l-v 

zN  e o 

— a.  y 
Add  and  subtract 


de  idzN 

- >1  - 


(2.4-12) 


on  the  right  hand  side  of  (2.4  -11).  Rearranging  terms  and  using  the 
definition  (2.4  -8)  it  follows 


- I,  * - l I I J d(9  ) p(P  ) dO  de 
l 2 J J v.y  a * ~y  ~a  -y 

e ev 

N N i N 

p(Z  , 0)  In  p(Z  |p  ) dZ  d£ 


or 

From  (2.4  -9)  and  (2.4  -12)  we  get 


N 

I (Z  l £ ) •' 


J 


(2.4-14) 
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) 


V. J 


For  Gaussian  signals  with  equal  covariances  [see  Chapter  3]  v/e  have 
J = SB  and 


I(ZN;  0 ) * 4 B 


(2.  4-13) 

0 


When  the  parameters  have  a discrete  distribution  the  average 

divergence  takes  the  form 
m - 1 m 


i=l j=i  + 1 


P P J 
i J iJ 


(2.4-16) 


Evaluation  of  the  distance  measures  is  much  easier  than  the 
evaluation  of  the  mutual  information  for  multiple  observations  (N  ^ 2)  . 
The  above  relationships  indicate  that  maximization  of  the  distance 
function  will  also  maximize  the  mutual  information.  The  results  of 
(2.4-14)  and  (2.4-15)  represent  the  distance  functions  as  alternatives 
to  using  the  mutual  information.  These  results  are  general  and  are 
independent  of  the  underlying  distributions  of  the  observation  process. 


2.4 . 2 The  Rate  Distortion  Lower  Bound  on  the  Mean  Square 
Error 

The  Cramer -Rao  bound  is  a widely  used  lower  bound  on 
the  mean  square  error  and  obtained  in  terms  of  the  Fisher  information. 
For  unbiased  estimates  of  Oclt  , the  C-R  bound  is  given  by  [81,  p 66] 


L> 


e [(9  - e >2]  •=  Je[{—  inP(zN.  eij2 


-1 


(2.4-17) 
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ljOWCr  bounds  are  also  derived  using  Shannon's  rate  distortion  theory 
In  this  section  a lower  bound  on  the  estimation  error  is  derived 
by  using  the  upper  bound  on  the  mutual  information.  I(/N;  0)  derived 


in  Section  (2.4  - 1 ). 


Let  the  estimate  of  0e  Ii  corresponding  to  the 
observation  ZN  be  denoted  by  0 = p (ZN)  . Let  p(ZN|0)  denote  the 
transition  density  of  Z for  a given  0 . The  distortion  in  the 
estimate  of  0 is 


d(0,9)  - (0  - 0)2 


and  the  average  squared  distortion  is 


(2.4-18) 


=//d(9-° 


0 ) p(6,  0 ) d0  d0 


(2.4-19) 


Define  the  set  T (D)  as 


©(D)  = ^ p(6  l® ) 1 Ee,  o[d(0,^)]  * 


(2.4-20) 


I hen  the  Shannon's  rate  distortion  function  is  given  by  [70,  68  , pp.  141 

155]. 


R0(D)  = inf  1(0  ;0 ) 

P(0[O ) cr.(D) 


(2.4-21) 


where 


do  d0 


(2.  4-22) 


me>  JJ 

e 6 


p(0, 8 ) In 


p(0  ) p(§  ) 


Thus  Rg(D)  represents  the  minimum  amount  of  information  to  be 

conveyed  to  the  estimator  in  order  to  attain  the  required  accuracy  D . 

A lower  bound  for  the  rate  distortion  RC(D)  can  be  obtained  when  D 

o 

2 

is  equal  to  the  mean  square  fidelity  criterion  C . Following  the 
development  in  Berger  [12,  pp.  92-102]  it  can  be  shown  that 

Re(C?)>-  me)  - In  (2  7 Te  e?')  (2.4-23) 

2 

From  the  definition  of  R0(  ( ) we  have 

o 

Ke(c2)s  1(0;$.)*  I (9;  ZN)  (2.4-24) 

The  last  inequality  follows  from  the  fact  that  any  information  contained 

A N n N 

in  6 is  also  contained  in  Z . Noting  that  I (0;  Z ) = I (Z  ;0)  we 

get 

I(ZN;0)^  11(8  ) - | In  (2  V c c?')  (2.4-25) 

From  this  the  lower  bound  is 

e[(8-0)?]  * exp  [2  H(0  ) - I (Z  N;  0 ) ] (2.4-26) 

The  main  result  is  obtained  in  the  following  theorem. 

Theorem  2.4  -2  Let  p(0  ) represent  the  prior  distribution  of  0 and 
J be  the  average  divergence  measure  given  by  (2.5-8).  Then 


: 


u 
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r 

A ?- 

1 r — i 

4 

0 - 6)  J 

----- 

(2.  4-27) 

Proof:  The  result  follows  from  (2.5-26)  and  the  inequality 


Remarks 

1.  When  the  parameter  set  is  discrete  with  equal  prior  probabilities, 

P.  = V-  i,  the  lower  bound  becomes 

i m 

e[(9-6)2]  a ~ — exp  [-j]  (2.4-28) 

2.  For  Gaussian  random  variables  with  equal  covariance  functions 
J = 8 B and  we  have 

e[(6  - 6)2]  a — — exp  ^2  H(0  ) - 8dJ  (2.4-29) 

The  appearance  of  the  distance  measure  in  the  lower 
bound  is  intuitively  appealing  for  its  use  as  a criterion  for  signal 
selection.  It  has  been  observed  that  there  is  a region  of  input  signal-to- 
noise  ratio  in  which  the  lower  bound  of  (2.4  -26)  performs  better  than  the 
Cramer-Rao  bound  [80].  Even  though  the  lower  bounds  derived  here 
cannot  be  considered  as  strong  competitors  for  the  existing  bounds,  the 
relationship  between  the  mutual  information  and  the  statistical  distance 
functions  is  quite  useful  in  applications  to  signal  design  and  feature 
extraction  problems. 


(J 


so 


2 . 5 0 n of  t!lP  D-riistancc  and  ils  Relationship 

to  the  Fi  sher  Inform  a l L o n 

I 

Bhattacharyya  proposed  that  the  numbers  Jp(Z  | Q ))  2 and 
|p(Z  |^.y)j  2 can  be  regarded  as  the  direction  cosines  of  two  vectors 
in  the  space  of  Z . Alternately  we  can  consider  them  as  defining 

O 

two  points  on  a unit  hypersphere  with  the  angle  between  them  given  by 

1 

,_N 

(2.5-1) 


cos 


= /jp(zN|6a)P(zN|9y 

fi 


)/  2 dZN  * o 

1 “ 


ay 


The  angle  between  them  must  clearly  be  between  0 and  - 

2 * 

The  above  definition  becomes  clear  by  considering  two  multi- 

nominal  populations  characterized  by  the  population  probability 

(WJ  , ff2  , v n)  and  (ffj  , ttz ffn)  . Then  as 

n 


v= 1 ■ VC 


ft  ) and 


" 1 ' ^^2  ' J V can  be  considered  as  the  direction  cosines  of 

two  straight  lines  in  n-dimensional  space  referred  to  a system  of 
orthogonal  coordinate  axes.  These  lines  may  be  called  population 
lines.  The  angle  between  the  lines  may  be  called  the  angle  of 
divergence;  any  monotonically  increasing  s ingle -valued  function  of 
this  angle  may  be  used  as  a measure  of  divergence.  Then  if  A is  the 
angle  between  the  lines  we  have 


— — — 
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1 

n >2 
COS  A = Y"'  I-  77 

I l 1 ) 

i=l 


(?..  5-2) 


When  A = ^ the  separation  between  the  populations  is  maximum  and 
when  A = 0 , the  two  populations  essentially  coincide,  as  the  affinity 
between  them  becomes  maximum. 


The  Fisher  information  matrix  defined  in  Chapter  1 gives  us  the 
sensitivity  of  the  observations  with  respect  to  the  parameters  0C  RP 
when  _0  has  a continuous  distribution.  The  Fisher  information  can  be 
related  to  the  B-distance  via  the  quadratic  differential  metric.  The 
Fisher  information  is  defined  by  the  elements  of  the  pxp  matrix  as 


M 


ij = K[i^r  i"p(/.K|o)  4 r lnp(zNl£)l 

1 j J 


(2.5-3) 


Consider  a perturbation  of  Oc  R*5  . If  (|6sj|  is  the  arcual 

( N | 

length  along  the  hypersphere  of  \Z  i between  the  points  with 

N J " 1 

direction  cosines  p(Z  |9j  2 and  p(Z  |9  +A^)2  then  we  can  write 


. 0 , 0 + A 9 = cos 


5 II 6 s H =y*  j P(zN|e)  p(zN|e  + A0)j  2 dzN 


n ■ 


(2.5-4) 


Let  us  state  the  following  regularity  conditions  on  the  density  function 
p = | 9 ) • 


(i)  The  partial  derivatives 
V i,  j = 1, 2, p . 


a 


In  p and 


dQ  99 

1 j 


In  p exist 


- - - 


(ii) 


2 

AH. 

V 

< F(Z  ) , 

9 P 

30 

3 0 3 9 

i 

1 j 

<C(ZN) 


(2.5-5) 


N N 

i,  j = 1, 2, p where  F(Z  ) and  G(Z  ) are  intergrals  over 


J 


n . 

iii) 


J it F di-N  = 0 -/ 


dZN  = 0 , Y L,j  = 1,2, , p 


n 


o 


d e.dO. 
1 j 


I 

I j 2 

Expanding  jp(Z  |£  + A 0.  j by  Taylor  series  about  £ 


(2.5-6) 


| P(ZN  I©  + A0  ) j 


I2 


p(zN|£) 


2 JL 

2 . P '2 

+ 

? 

i-  1 


ly  p aa  / e 

ZH  P 90.  r ^ 


p p 
i=l  i = l 


II  9t3  ap  + __a p 


4 p 9 0 90  30  3 0 

L j » j 


(2.5-7) 


Equation  (2.  5-4)  becomes 


cos 


!MI=/jp<2>>+!£  -ff;  as, 


a 


i- 1 


P P 

~EE 

i l y-1  L 


2 


II  9 P a P d P 

4 p 0 9 . 30  . c)9.d0  . 

1 J 1 J 


A 0 /V  0 
1 j 


dzNf, 


. (2.5-8) 


3_P 

39  . 

J 
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Using  the  regularity  conditions  we  obtain 


P P 


IM  = i-IEE 


i-i  j^i 


_A_  inp(zN!9)  _|-in  P(xN|0) 

' 1 'j 


p(Z  |0)  dZ  • AG.  A«.  + . . . 
’ i J 


Rewriting  (2.5-9) 


1 P P 

1 - cos  |! 6 s j!  A 0^  A 9. 


i=l  j=l 


i tic  ii  o • 2 !1  ^ s 1!  , 

1 - cos  I* 0 s Ji  ?»  2 sin  w 2 • — — 


F rom  (2.  5-10)  and  (2.  5-11) 


P P 


M ■=  ?E  E Mij  AeiAej 

i-i  j=i 


where  M..  are  the  elements  of  Fisher  information  matrix  defined 

U 

by  (2.6-3).  By  definition 


0 , G + A 0 = cos 


IM  ~ i - 


(2. 5-13) 
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n 


B = - In  p 

9 , e 4 a e p , a + a o 


= - In 


!M_2  j 


(2. 5-14) 


For  small  |J  6 s | 


we  have 


- In 


[» 


Thus 


B 


e , e -i  as 


i=1  j=i 


M . 6 9.  A 9 
iJ  i j 


(2.5-15) 


This  indicates  that  in  order  to  distinguish  between  unlike  parameter 
sets  increasing  the  differential  metric  is  equivalent  to  increasing  the 
sensitivities  of  the  observations  v/ith  respect  to  infinitesimal  changes 
in  the  values  of  the  parameters.  Thus  a close  relationship  between 
the  distance  measure  and  the  Fisher  information  exists,  the  distance 
function  representing  the  difference  in  the  distributions  for  neighboring 
parameters.  The  differential  metric  suggests  that  this  quantity  can 
itself  be  used  as  a criterion  function  for  signal  synthesis.  However, 
there  is  no  guarantee  that  A0.  are  small  enough  to  obtain  a 
meaningful  approximation  and  it  may  be  that  these  approximations  are 
very  bad  from  stage  to  stage  in  the  estimation  of  9^  . Expansion  of  the 
transfer  function  matrix  in  the  frequency  domain  lias  been  used  by 
Mchra  [(>?.]  in  approximating  a state  space  model  to  a regression  model. 
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The  purpose  here  is  only  to  show  the  connection  betv/een  different 
information  measures,  rather  than  to  use  it  for  problem  solutions. 

Summarizing  the  discussions  in  this  Chapter,  the  Bhattacharyya 
distance  is  defined  and  its  ordering  with  respect  to  the  probability  of 


error  P is  obtained  as  a necessary  condition  for  the  optimality  of 
e 

P . Thus  the  input  signals  obtained  this  way  will  result  in  the 
e 

N 

maximum  a posterori  probability  P(P  ! Z ) representing  the 


parameter  learning.  It  is  also  shown  that  the  distance  measures  are 
closely  related  to  the  mutual  information  and  the  maximization  of 
distance  functions  wall  also  maximize  the  information  rate.  It  is  much 
easier  to  implement  the  distance  functions  numerically  in  designing  the 
input  signals.  It  is  further  shown  that  the  differential  metric 
B0  , 0 + /S  6 represents  the  sensitivity  of  the  observations  with  respect 
to  the  parameters,  in  terms  of  the  Fisher  information.  Thus  the 
distance  measure,  in  a way  reflects  the  sensitivity  index  to  be 
maximized  to  choose  the  optimal  signals.  In  the  next  chapter  we  apply 
the  distance  functions  and  make  a comparison  of  their  performance  in 
parameter  learning. 


CHAPTER  III 


SIGNAL  SELECTION  IN  LINT 


CAR  DYNAMICAL  SYSTEMS 


The  problem  of  designing  optimal  inputs  in  linear  systems  is 
described  in  this  chapter.  The  distance  measure  is  derived  for  the 
general  linear  Gaussian  case.  The  optimization  problem  is  solved 
when  the  inputs  arc  constrained  to  have  total  energy  less  than  or  equal 
to  a given  value.  For  this  case  the  optimal  inputs  arc  given  by  the 
eigenvector  corresponding  to  the  largest  eigenvalue  of  a symmetric 
positive  definite  matrix.  When  the  number  of  stages  N is  large,  a 
numerical  method  is  derived  to  obtain  the  optimal  signal  vector. 
Comparison  of  the  performances  using  the  B-distance  and  the 
divergence  measures  is  made  as  a function  of  the  dynamic  noise 
covariance.  Optimal  input  sequences  are  obtained  with  these  two 
criteria  and  the  numerical  results  show  that  for  increasing  value  of 
the  noise  covariance,  the  learning  with  B-distance  is  better  than  with 
the  divergence.  This  comparison  is  made  for  the  case  of  equal  prior 
probabilities  of  the  parameter  sets.  Two  examples  are  presented 
illustrating  various  aspects  of  the  signal  synthesis  problem. 

The  statement  of  the  problem  and  a brief  discussion  of  the 
identifiab ility  of  parameters  in  linear  discrete-time  systems  is  given 
in  Section  (3.  1).  The  distance  measure  is  derived  in  Section  (3.2)  and 
the  optimization  problem  is  discussed  in  Section  (3.3).  A sufficient 
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' 


condition  is  obtained  for  the  asymptotic  accuracy  of  learning.  This  is 
similar  to  the  condition  of  persistent  excitation  of  the  input  sequence 
given  by  Ast.rSm  and  Bohlin  [6].  The  open-loop  inputs  can  be 
characterized  as  satisfying  a linear  difference  equation  of  low  order. 
This  is  discussed  in  Section  (3.-1).  Two  numerical  examples  are 
presented  in  Section  (3.  5). 

3'  * Statement  of  the  Problem  for  the  General  Discrete  Time 
Stochastic  I. inear  System 

The  signal  selection  problem  for  the  linear  Gaussian  system  is 
presented  with  a brief  discussion  of  parameter  identifiability  in  discrete 
time  systems. 

3.  1.  1 Problem  Statement 

Consider  the  following  representation  of  the  dynamic 

system 

-k  = #2k-l  +£Uk-l  ^ -k-1  (3.1-1) 

-k  = IJ^k  + -k  ' k=1'2'-'-  (3.1-2) 

$(n  X n)  , H(s  X n)  and  6(n  x 1)  are  time  invariant  matrices,  w and 
vk  are  Gaussian  white  noise  sequences 


^k~ 

G(0,Q) 

(3. 1-3) 

^k  ~ 

G(0,R) 

(3. 1-4) 

Eh 

Tl 

ykJ,  0 , V j,  k 

(3. 1-5) 
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The  initial  state  has  a Guassian  distribution  with 


— 0 ~ G(^o  ’ po) 


It  is  further  assumed  that  x^  is  independent  of  j , k-0,  1, 


|v  , k = 1,2,....)  s o tha  t 

rk  i 


o 


, k - 0,1,2,... 


= 0 , k - 1, 2,  . . . 


(3. 1-6) 


. . | and 


(3.  1-7) 
(3. 1-8) 


The  above  system  representation  is  popularly  known  as  the  state  space 
representation  (e.  g.  Sorenson  [73,  p 111]).  The  unknown  parameters 
6 contained  in  the  set  (3  , §_)  . Let  $ be  a stable  matrix.  Let 
0 C 0C  RC1  where  O is  a compact  set  in  . Let  P(£)  denote  the 
joint  a priori  distribution  of  £ . When  0.  has  a discrete  distribution 
such  that  £ belongs  to  a finite  set  of  discrete  values,  let 


P(£  " .£)  = P.  , i = 1,2, 


(3.  1-9) 


ill 

P.  * 0 and  T'  P.  = 1 

i . — i 


(3.  1-10) 


Define  the  following  notations 


,.'k  T mk 

A ( ^ j * ^2  ^ ) c R 


(3. 1-11) 


k a i ,T  k + 1 

U A (u0  , u j , . . . , u^)  c R 


(3.  1-12) 


N-l  N 

Let  the  vector  U C R of  inputs  be  constrained  to  the  set  O 
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where  U is  a compact  set  in  R . Two  types  of  constraints  belong 
to  this  category  - the  energy  constraint 

UE  = |-N  1 I II II  s V | (3.  1-13) 

and  the  magnitude  constraint 

UM  = j UN_1  | ak  5 uk  * bk  , k = 0.  1, N-lj  (3.  1-14) 

The  optimal  input  design  is  defined  by  the  class  of  admissible  inputs 
that  maximize  the  distance  measure  derived  from  the  system 
described  by  (3.  1-1)  - (3.  1-10).  No  probability  distribution  is 
associated  with  these  inputs  and  they  belong  to  the  class  of  open-loop 
deterministic  inputs. 

The  multistate  structure  of  (3.  1-1)  and  (3.  1-2)  in  the 
cononical  form  reduces  to  the  scalar  single-input  single-output  (S1S0) 
form  which  we  will  discuss  in  Section  (3.  1-2).  We  also  make  use  of 
this  representation  in  Chapter  4 where  the  synthesis  of  random  inputs 
will  be  discussed. 

3.1.2  Identifiability  of  Parameters 

In  this  section  we  define  the  concepts  of  identifiability 
and  state  cond:tions  under  which  a linear  system  is  identifiable. 

These  results  require  that  the  system  be  completely  controllable  and 
observable.  First  we  define  these  terms  for  a deterministic  system. 
Tiiese  concepts  were  first  introduced  by  Kalman  [43]  in  the 
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Theorem  3.  1. 2 [60  p -18]  The  system  (D)  is  completely 
observable  if  and  only  if  the  nxinn  matrix 


has  rank  n . 


Now  let  us  define  the  identifiability  of  6_  and  the  conditions  for 
identifiability  when  the  system  is  observed  with  noise  present. 

A A k ( ) °“ 

Definition  3.  1,3  [76]  Let  6 = 0 (Z  ) where  {z,  ( is  a 

k “ (~k  'k=l 

sequence  of  observations  parameterized  by  0 , A sequence  of 
estimates  is  said  to  be  consistent  in  probability,  if  for  any  6,  e 
arbitrarily  small,  there  exists  an  N(e  , 6 ) > » such  that  for 
n > N(C  , 6 ) 


Pr  d(0n,  0 ) > 6]  < e 
where  d is  a metric  defined  in  © C Rq  . 


Definition  3.  1.4  [76]  The  parameter  6^  is  said  to  be  identifiable  if 

( a »“ 

there  exists  a sequence  of  estimates  < 0 J , which  is  consistent 

| — n j n=  1 

in  probability,  viz. 


A P 

e e 

— n — 


(3. 1-20) 


Consider  the  system  described  in  (3.  1-1)  and  (3.  1-2)  without  the 


control  variables  u.  . 

k 


u 


X “Ox  -f  w 

-k  -k-1  -k-1 


ik = H^k + ^ 


The  steady  slate  Kalman  filter  representation  of  (S)  is  the  following. 


2k  + l|k  = S2k|k_i  < Bvk 


(3.  1-23) 


^k=  H^k|k-1 


(3. 1-24) 


The  steady  state  gain  and  covariances  are  given  by 


= P iit  Jhph1  + rJ 


(3. 1-25) 


P = <5> 


hn] 


PO  4 Q 


(3. 1-26) 


V is  called  the  innovations  sequence. 


•?k  = ^k  - 


— k | k- 1 


(3. 1-27) 


Under  steady  state  the  innovation  sequence  v^  is  a stationary  white 

Gaussian  sequence  [6l]  . The  original  problem  of  estimating 

e = (0,11,  R0,  Qo)  can  be  reformulated  as  (where  R^  and  arc 

the  steady  state  values  of  R,  and  Q,  ) that  of  estimating  the 

k k 

parameters  (0,11,15)  . The  following  theorem  can  now  be  stated: 


Theorem  3.  1.3  [7t'j  I .el  the  system  ( S ) 


satisfy  the  following 


conditions: 

(i)  The  eigenvalues  of  <&  are  within  the  unit  circle. 

(ii)  («.H)  is  an  observable  pair. 

(iii)  ('I*,!!)  is  a controllable  pair  viz.  the  matrix 


|^B,  Ii] 


(3. 1-28) 


has  rank  n . 

Then  the  linear  system  (S)  is  identifiable  up  to  its 
state  Kalman  filter  representation  . 


equivalent  steady 


"Regardless  of  the  identification  algorithm  if  two  systems  of  the 
form  (S)  have  the  steady-state  Kalman  filter  with  the  same  impulse 
response  and  the  same  innovations  covariance,  then  they  cannot  be 
resolved  using  stead-state  measurements.  For  a given  system  (S) 
there  may  be  many  steady- state  Kalman  filters  with  identical 
impulse  response  and  innovations  covariance.  Thus,  without 
additional  structural  constraint  the  model  is  in  general  not 
identifiable."  In  the  steady-state  form  (3.  1-23)  - (3.  1-24)  if  the 
matrices  (<&,H,B)  are  in  a unique  form  (that  is,  the  unknown 
parameters  B)  must  be  invariant  under  state  coordinate 


transformation)  then  the  steady-state  Kalman  filter  is  identifiable. 
Some  canonical  forms  for  multivariable  systems  are  given  by 
Weinert  and  Anton  [83].  The  conditions  of  controllability  and 
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r 


1 

t i 


J 


J 


u 


and  observability  will  make  the  parameters  of  the  system  matrix  $ 
not  all  independent.  In  this  case  the  system  can  be  reduced  to  a 
unique  canonical  form  as  follows. 


.Theorem  3.  1, 4 fo2 , p So]  Given  a dynamic  system 


*k  + i = 4^k+  ta* 


\ = 11 


(3. 1-29) 
(3. 1-30) 

where  the  apir  ($,h)  is  observable,  ($,£,h)  can  be  transformed 

a.  %».  %t, 

to  the  canonical  form  fl"  , h"  ) by  using  the  observability 

J-  *•':  *»; 

matrix  0 where  $ ' , jS  ' , h"  and  O are  defined  by 


$ 


0 

1 

0 

1 

1 

• I 

• 

| n- 1 

I 

0 

1 

J 

*1 

| ^ q •••  ••• 

, ^ n 

_ 

i 

(3.  1-31) 


£ " 1 '^2  ' **•  ’ ^n] 

hV  = [l  0 oj 


(3. 1-32) 
(3. 1-33) 


and 


O 


--  [i.T.  hT® 


(3. 1-34) 

© 


I he  canonical  structure  assumes  that  all  the  unknown  parameters 
in  the  $ matrix  are  identifable.  The  canonical  form  of  theorem 


■ 1 


(3.  1-4)  is  also  equivalent  to  a difference  equation  representation  as 
given  by  the  following  theorem. 

Theorem  3.  1.5  [52,  p 9 0]  Given  a difference  equation 


= V Q,  .8  . U 

/ j i k - i • • J i k~ i 


(3. 1-35) 


a. 

it  can  be  reduced  to  the  standard  canonical  form  (<p  ‘ , h , d ) given 


p 2k-i  + ± Vi 


(3.  1-36) 


Zk  * * 


(3. 1 -37) 


L\  o 


(3. 1-38) 


Reduction  to  canonical  forms  assumes  that  the  syrslcm  matrix  <P  is 
uniquely  represented  by  n parameters  which  are  identifiable.  The 
form  (3.  1-35)  is  also  referred  to  as  the  single- input  single-output 
(S1S0)  model.  In  time  series  analysis  these  are  referred  to  as  the 


autoregressive  moving  average  (ARMA)  models.  We  shall  consider 
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these  representations  in  Chapter  4. 

ihis  concludes  our  discussion  of  system  representation 
and  ident if iability . lor  more  thorough  discussion  the  reader  is 
referred  to  [6  2,  61,  76,  83], 

3.  2 Distance  Measure  For  Linear  Gaus  s ian  Systems 

Consider  the  linear  system  described  by  (3.  1-1)  and  (3.  1-2). 

For  this  Gauss-Markov  model  the  joint  conditional  density  function  of 
the  observations  Z is  a Gaussian  density  function  p(Z^'|0)  . The 


mean  and  covariances  are  given  by 


an  r N.  ■ 

z = e z |e 

r T n 

x — » N an  N aw 

£n  = E (Z  - Z1N)  (7.  - Z1  ) |e 


(3.2-1) 


(3.  2-2) 


For  a given  pair  (0^  , 0 ) in  © the  Bhattacharyya  distance  is  given 


The  covarance  matrix 


is  not  a function  of  the 


dctei  ministic  open-]oop  inputs  and  hence  the  second  term  in  (3.2-3) 
is  independent  of  the  inputs  U . Neglecting  this  term  we  have 


B = I (7N  . zN)J  J2  l (yN  7N> 
ay  8 -ft  -y  ' “ay  (-ft  “ -y  * 


(3.2-5) 


When  the  covariances  are  equal  (this  happens  when  the  dynamic 
noise  is  zero  and  the  initial  state  is  non  random)  such  that 


2 . L .2. 


0!N 


yN 


the  B-distance  has  the  simple  form 


B 


= I(zN-^N)  V'Sz N-yN) 
ay  8(-a  -y’  2^  '-ot  -y] 


(3. 2-6) 


]T  consists  of  the  observation  noise  covariance  R . The  following 

© 

two  theorems  give  the  expressions  for  the  B-distance. 

Theorem  3.  2.  I For  the  system  described  by  (3.  1 - 1)  - (3.  1 -S)  the 
joint  Guassian  probability  density  p(Z  |j9  ) is  given  by  the  mean  and 
covariance  functions  as  follows. 


where 


0 R n * 

- u U . . j 


£ ■ |E[(ik-*kMat-i*)Tli]| 


N . k-t 


Tlie  elements  of  the  covariance  matrix  are  given  by 

N 

E [<*k  - ik>  <*k  - lk)Tli]  = HPk  IIT  + R 

k=l,  2,  , N 

E l(-k  - -^k}  (^i  - A^)Tl®]  - upk'jl~k  ht 

k- 1,2,...,  N-l 


^-k  + 1, . . . , N 


Pk  = E [(-k  ' ^k’  (Sk  - Sk)T|e] 


generated  recursively  by 


Pk  =5>Pk-l  0 +Q  ’ k=1'2 N 


with  initial  condition  P . 


Proof.  Since  xQ  and  a re  Gaussian  random  variables  and 

2£k  is  generated  by  the  Gaus s -Markov  model  of  (3.  1 - 1 ) (x  ,...x  ) 

are  jointly  Gaussian  and  being  the  sum  of  Gaussian  random 

variables  IIx^  and  it  follows  that  ZN  jointly  Gaussian. 
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O 


3 


e[{%  ■ yy  i0] 

E j(H-k  + -k  ‘ “SrI 


r T i i 

■%t)  l£j 


(3.2-18) 


k < l 


x . is  written  in  terms  of  x,  as  follows 
— -o  — k 


't-k 


**  - 0 ^ +E  6 f"k+l-,+E  ‘ 


$ 


-k  + i-1 


i-1 


i-  1 


-t-k 


A ..  £-k  A V'"'  * 

U -*JL  ~ v (*k  - * J + X,  £j> 


- ^-k-  i 


w 


k + i-1 


i=l 


Hence 

E (*k  - *k)  (*<,  ~ £^)T|£  = H E [(xk  - £fc)  (xk  - xk)T[0]O4‘k  HT 


= HPk?UIlT 
k=l,  2, . . . , N -1 
l -k  -I  1 ....  , N 


(3.2-19) 


0 


Theorem  3.2.2  The  pairwise  distance  measure  between  the 


I IN  | 

densities  p ( Z |0  ) and  p(Z  [9  ) is  given  by 
a y 

v 4 i 0N-l'r  mT  £ 1 
ay  8 0 ay~  0 8 — ay  ay 


,N  i 


M U 
ay  ay  ay  — 


N- 1 


-1 


1N> 


N £ N 

a —y 


*-  “ii  * ; -x 

a u y y “ 0 


(II  $2  - H $2  )x. 

a a y y -0 


(H  $N-H  $^)x 

a a y y — o 


n " - i i vi  - 

a— a o y— y 0 


2 2 - i 2- i 

? (II  $ B -H  $ fi  )u. 

. , o;  a ~cif  y y ^-y  i- 

L-  1 


N 


Y (H  “H  0N'\S  )u. 

z-',v  a a —a  y y — y i 

i=  l 


(3. 2-26) 


This  can  be  written  concisely  as 


AM  AM 

Z^-Zy  = X0  + (Ma-My  >U 


N- 1 


(3.2-27) 


where  M is  given  by  (3.2-24).  Thus 

a 


N 1 


ay  8 


*0 


N-  1 


-1 


(M  - JVl  ) ‘ 

a y 


! l-*o 


+ (M  - ML  ) U 

a y - 


N-l 


-1 


.‘*0  +|uN-lTMTayE^MayUN-> 


ay 


-1 


1 T " N-l 

+ T Y K1  U 


(3.  2-28) 


w lie  r e 


i\l  = M - M 

ay  a y 


Note  that  B consists  of  three  terms  - a quadratic  term  in 

ay  1 

U , a linear  term  in  U and  a constant  independent  of 


The  average  B-distance  then  becomes 


B 


N=  EW  =/  / B*V  P<V  >><V  d V9- 


©x  0 


When  0 has  a discrete  distribution  with 


(3.2-29) 


P(®.  - . ® ) = P.  , i=l,  2,  . . . , m we  have 

_N  m-1  m 

B = 2£  E P.P.B.. 

7^i  I J tj 

l~]  j=i  + 1 


(3. 2-30) 


3.  3 The  Optimization  Problem 

The  form  of  the  optimal  inputs  depends  on  the  constraint  on 
the  inputs.  When  the  constraint  is  of  the  energy  type,  the  constraint 
region  is  an  N-dimensional  ball,  and  the  optimal  inputs  arc  given  by 
the  eigenvector  of  a symmetric  positive  definite  matrix  corresponding 
to  the  maximum  eigenvalue.  When  the  inputs  are  amplitude  limited 
the  optimal  sequence  is  controlled  by  a switching  function,  the  inputs 
saturating  at  their  maximum  values  giving  bang -bang  type  of  signal. 
Using  Bayes'  estimation  scheme  it  is  shown  that  for  a given  accuracy 
of  learning,  a certain  quadratic  form  must  be  nonnegative.  This  is  a 
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function  of  the  total  input  energy  and  the  matrices  that  are  functions 

O 

of  discrete  parameter  sets.  This  result  is  similar  to  that  of  Astrom 
and  Bohlin  [6]  where  the  inputs  are  required  to  be  persistently 

O 

exciting.  The  result  of  Astrom  and  Bohlin,  unlike  the  one  derived  in 
this  section,  is  a function  only  of  the  input  signals. 


3.  3.  1 Optimal  Inputs  for  Energy  Constrained  Case 
Consider  the  optimization  problem 


max  E„  B 

6 cay 


UN-J  c U 


U = {uN-!  | Hu*-1!2*  y} 


(3.3-1) 


(3.  3-2) 


form  the  Lagrangian  function 

L=  EelBayl  " x (II  uN_1  II2  - Y)  (3.3-3) 

where  X > 0 . The  necessary  condition  for  optimization  is  given  by 

[19,  p 70]  . 

i v„(EeIBj}-x^N''=  0 <3-3-4> 

Using  (3.2-20)  the  above  condition  becomes 


1 

8 


E, 


M 


T 

<vy 


N- 1 = 0 


(3.3-5) 


4 

1 


s . 


I. cl  xq  ~ O then  X_ ^ = O and  (3.3-5)  simplifies  to 


* I K°  [ 


T - 1 
M „ 2 M 
ay  ay  ay 


uN-!=  XuN-] 


(3. 3-6) 


The  maximum  of  the  quadratic  form  is  achieved  when  the  RITS  in 

N_  I 2 

(3.3-6)  has  its  maximum  value  which  is  X |(u  ’ II  = X Y 

max.  11  max. 


Hence  it  is  clear  that  the  optimal  input  sequence  is  the  eigenvector  of 
the  matrix  — | Eg [M^y ^ay  May]  | corresponding  to  its  largest 


eigenvalue  X . Thus 

max. 


(3.3-7) 


where  _e  is  the  normalized  eigenvector  corresponding  to  X 

m ° max. 

For  large  N , direct  evaluation  of  the  eigenvectors  is  time 
consuming  and  in  Appendix  C we  present  a numerical  method  based 
on  gradient  calculations  to  solve  the  optimization  problem  (3.3-1). 
In  actual  numerical  calculations  the  parameter  set  is  assumed 
discrete  taking  on  finite  number  of  values  and  the  quadratic  matrix 
Q is  calculated  using 


, m- 1 m 

O « 7 £ P.  P.  r:.1  M.. 

4 » J ‘J  IJ 

i=l  j-i + 1 


(3.3-8) 
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The  prior  probabilities  P arc  all  assumed  same. 


P.  = — , i = 1,2, 

1 m ' ' 


m and 


E V*- 


3.  3.  2 Parameter  Estimation 

The  minimum  mean-square  estimate  of  6 based  on 


observations  |zk  j is  given  by  [72,  p 227] 

4=  e[0|  zk]  -/  0pL0lzk)d£ 


(3.3-9) 


0 c e 


I k 

The  a posteriori  density  p(0_  J Z ) is  given  by  the  Bayes1  rule 


p(e  i zk ) = -Big  wm 

f P(Zk  |0)  p(©)  d0 


, k=lf  2, (3.  3-10) 


e c e 


Recursively  (3.3-10)  can  be  written  as 


k p(2 j/"1  , 0)p(0  |zk-1) 

p(0  | Z ) = 

f p(2klzk_1 , e_)p(0  |zk_1)d0_ 


(3.3-11) 


I 


When  0 has  a discrete  distribution  with  prior  probability 


P(0_  = ^ 6_ ) - P.  , i=  1 , 2,  . . . , m then 


m 


o , = V .0  p(.e  |z  ) 

— k i“  i—  — 


(3. 3-12) 


i=l 


The  a posteriori  probabilities  P(J)\z  ) are  recursively  updated 
using  the  Bayes'  rule. 


p(=Lklzk_1  , .0)  P(.e|  z*'1) 

m 

£ pfejz  1 • |z  fc'‘> 


k-1 

r7  ' 


P(.ilzk)  = 


(3. 3-13) 


j = l 


1=1,2,-..,  m;  k=l,2, ... 


The  initial  conditions  are 


P^Gjz0)  = P.  * i=l»  2, m 


(3.3-14) 


Braverman  [ 1 7 , p 29]  showed  that  the  a posteriori  probability 
corresponding  to  the  true  set  of  parameters  converges  to  1 as 
N *♦  “ . That  is  P(^®_|z^  ) 1 when  ^0_  = ^ . 
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3.  3.  3 Accuracy  of  Learning -Condition  on  the  Input  Sequence 

For  the  linear  Gaussian  system  we  obtain  a relationship 
between  the  accuracy  of  parameter  learning  and  the  input  signal-to- 
noise  ratio  (ISNR).  The  accuracy  of  learning  is  defined  as 


c = l - P(0olzN) 


(3. 3-15) 


where  6_  is  the  true  parameter.  The  condition  is  derived  ns  the 


1 
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positive  definiteness  of  a quadratic  form  involving  inputs 
Consider  the  following  dynamics  with  scalar 


observations. 


^k-1  + - \-l  ' -0  ‘ ^ 

(3.  3-16) 

2k  = 

T 

h xk  + vk  , k=l,  2, . . . , N 

(3. 3-17) 

vk~ 

G(0,  a 2 ) , k=l,  2,  . . . , N 

(3.  3-18) 

input 

Vk 

signal-to- noise  ratio  as 

IS  NR  = 


(3.3-19) 


l!uN_1|| 

N 

v « 

vk 

k=l  K 

Suppose  we  require  the  following:  For  a given  N (large  but  finite) 
and  e > 0 (c  sufficiently  small)  we  require  P(9Q|zN)  > 1 - C 

The  following  theorem  shows  that  there  is  a value  of  ISNR  satisfying 
this. 

Theorem  3.3.1  Consider  the  system  described  by  (3.  3-16)  - 

(3.3-18).  For  a given  finite  intergal  N and  C > 0 (€  sufficiently 

small)  there  exists  a value  of  ISNR  such  that  for  all 
1 N 

n N # P(9_0I  Z ) > 1 - c , where  6_  = ^()  is  the  true  value  of 

the  parameters. 


This  is  governed  by  the  inequality 


1 

2(m-  1 ) 


£ »uN-1» 

i*  4 mT  . 

■C  i 


> 


m2  2 (3.  3-20) 


0 


Proof:  See  Appendix  D 

The  required  accuracy  in  learning  can  be  achieved  by  using 
a proper  ISNR  above  a certain  threshold  as  dictated  by  the  inequality 
(3.3-20).  The  above  condition  can  be  treated  as  a sufficient  condition 
for  the  estimation  error  to  be  less  than  a given  fidelity  criterion. 


Assuming  that  the  observation  noise  covarianc?  is  fixed, 

N 1 2 

for  a given  total  input  energy  Y = f(u  ||  , it  may  not  be  possible  to 

attain  this  accuracy  since  the  ISNR  given  by  (3.  3-19)  need  not  satisfy 
the  conditions  of  the  above  theorem.  The  theorem  requires  that  the 
input  energy  be  variable  for  a given  C and  N . Conditions  analogous 

O 

to  the  above  are  given  by  Astrom  and  Bohlin  [6]  and  Aoki  and  Staley 
[3],  to  be  satisfied  by  an  input  sequence  for  the  maximum  likelihood 
estimators  to  be  efficient.  These  are  stated  exclusively  in  terms  of 
sample  correlation  of  the  input  sequence. 

In  Chapter  4 where  we  consider  stochastic  inputs  we  )eed 
to  impose  certain  mild  restrictions  on  the  spectrum  of  the  signals. 
These  requirements  primarily  characterize  the  input  signals  and  by 
the  positive  definiteness  of  the  corresponding  input  correlation  matrix. 
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the  condition  of  "persistently  exciting"  inputs  is  automatically 


satisfied. 


3.  4 Characterization  of  Open-loop  Inputs  for  Linear  Systems 

The  open-loop  signals  derived  in  Section  (3.3)  can  be  approximated 
by  linear  difference  equations.  When  the  unknown  parameters  are 
contained  in  the  gain  vector  £?  , the  inputs  satisfy  a second  order 
difference  equation.  When  the  unknown  parameters  are  also  contained 
in  the  state  transition  matrix  such  a representation  is  not  explicitly 
possible.  Here  the  optimal  inputs  can  be  adequately  represented  by  a 
low  order  difference  equation.  The  numerical  fit  is  obtained  by  a 
least  squares  determination  of  the  parameters  of  the  difference 
equation  having  the  form 
s 


4iVi 


(3.4-1) 


i=l 


It  is  shown  that  for  a simple  linear  regression  type  system  the  optimal 
input  corresponds  to  the  eigenvector  of  a banded  Toeplitz  mati  Lx.  In 
this  case  the  input  sequence  naturally  obeys  a linear  difference 
equation. 

3.4.1  The  Linear  Regression  Problem 

The  linear  regression  model,  because  of  its  special 
structure,  gives  rise  to  closed  form  solution  for  the  optimal  input 
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problem.  Consider  the  following  dynamics: 


°i  Vi+i + vk  • k=1-2' 


(3.4.?.) 


vk  ~ G(0,  o ),  k=l  ,2,  . . . , N 


(3.4-3) 


e a (0  , e, .....  e ) 

— — VC.  p 


(3.4-4) 


Then  for  N stages  we  can  write 


ZN  = A 0 + V N 


(3.4-5) 


where  A is  Nxp  matrix. 


A "2  U1 


(3.4-6) 


u u u , 

3 2 1 


u vi  u 

p p-1  p-2  u 


UN  UN-1  UN-2  UN-p  I 1 


N | 

The  joint  density  p(Z  |0_)  is  Gaussian 


p(Z>>  = (ZTT)  2|Rn|  2 exp^-~  (Z*  - A9_)T  R^1  (ZK  - A0  )"| 


1 ,..N 


. T _ - 1 ,„N 


2 ? 2 

where  R,A  diag  (CT  a ) 

N — v v v 


The  pairwise  distance  measure  is  given  by 


V=“~r  >T  aT  a - ?-y  i <3-4-’> 

8 O 

v 

N 

This  can  be  rewritten  as  a quadratic  in  U such  that  the  quadratic 
matrix  is  a function  of  the  parameter  values. 


Consider  the  case  p-2  . Then  (3.4-9)  can  be  written  as 


„ 1 ttN  . N-l 

Boy=  — 2 U Q U 

8 a 

V 


where  Q is  a tridiagonal  matrix  having  the  form 
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a = Ec  [(0,  -6  )2  ..  (0  -0  )Z] 

Gccfjy  L 1'/  2a  2'/  J 

b = Ec  ["(6,-6,  ) (9  -0  )] 

a6y  - lfY  ly  2ct  zy  J 

a'=  E%ey  [tei«-eiy  ,Z]  (3-4-14) 

Finally  the  distance  measure  has  the  form 

•«S[BaJ  = 2N  d»"  (3.4-151 

The  matrix  Q is  an  almost  toeplitz  form  except  for  the  last 
element  q a'  . The  eigenvector  of  Q corresponding  to  any 

eigenvalue  can  be  computed  as  follows: 

(a  - X ) u j + b u = 0 

b u.  , + (a  - X)  u.  + b u.  ,=  0 i=2,3,...,N-l 

i- 1 l l+l 

b UN-1+  (a’  ' X)  UN=  ° (3.4-16) 

We  see  that  except  for  i-1  and  N the  remaining  components  are 
generated  by  the  same  order  difference  equation.  When  the  number  of 
parameters  p > 2 , the  order  of  the  difference  equation  increases  and 
the  approximation  becomes  less  accurate.  This  special  case,  however, 


3 


illustrates  that  a difference  equation  representation  of  the  optimal 
inputs  is  feasible.  The  next  section  is  devoted  to  the  representation  of 


r 1 

j 

• I 
1 


i i 


O 


1 


L.  i 


optimal  signals  in  dynamic  systems  described  in  Section  3.3. 


3.4.2  Generating  Function  for  Optimal  Signals  in  Linear 
ni  namics 

Discrete  optimal  control  theory  is  applied  to  the 
characterization  of  open-loop  inputs  to  show  that  when  the  unknown 
parameters  are  contained  in  the  gain  vector  the  input  sequence 
satisfies  a second  order  difference  equation.  The  generalization  to 
the  case  when  the  state  transition  matrix  contains  unknown  parameters 
does  not  yield  such  a result  because  of  the  difficulty  in  manipulating 
the  equations.  In  this  situation  the  signals  can  be  approximated  by 
low  order  difference  equations. 

Consider  the  following  scalar  discrete  time  system: 


*k=  yxk-i ^ uk-i 


(3.4-17) 


z.  = h x,  + v,  , k=l,2, ....  N 

k k k 


(3.4-18) 


Let  us  consider  a discrete  distribution  for  6 - j5  taking  values 


0 = 0..  1=1. 2, , m with  equal  prior  probabilities 


P.  =~  , i=l,2, 

i m 


m 


The  following  optimization  problem  is  staled  for  the 
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above  case: 
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N 


max  13  - max 


U 


N- 1 uN-l 


rn  - 1 m 

E E (E  <*,„•  v2 


„ 2 2 

2m  a . . , . \ , , 

v 1 j-i  + 1 k=  1 


(3 


subject  to  the  linear  constraints 


m 


Xik  = 9 xi,k-l  + ^i  uk-l  ’ l"1,2,  ** 

k~l , 2,  . . . , N 


(3 


and  the  input  constraint 


{ uN-'l  »un-’||2s  r) 


Define  the  Hamiltonian  H as 


H - 13 


-X 


/ N-l  \ m / N - 1 

l EvJ)(E  [E 

1=1  i=l  k=0  7 


(3, 

© 

(3. 


■where  p , i-1,2,  ....  m ; k=0,  1,...,  N-l  are  the  costate 
multipliers  and  X is  the  Hagrange  multiplier,  X > 0 . 


Further 


f ..  » u,  ) “ x.  - x.,  = (O  - 1 ) x + 8 » 

lk  k k l,  k + 1 ik  ik  i 1 


i 1,2,...,  m \ k— 0,1,...,  N - 1 ( 3 . 


Following  Canon,  Cullum  and  Polak  [l9.  pp  75-84]  the  necessary 
conditions  for  the  discrete  optimization  problem  defined  by  (3.4- 
(3.4-21)  are  given  by  the  following  set  of  equations: 


.4-19) 


.4-20 


4-21) 


4-22) 


4-23) 


19)- 


The  result  for 


rn>  3 follows  from  this  ease 


Then  equation  (3.4-24)  and  (3.4-25)  take  the  form 


1. ^ . L. 


afik<xk-uk> 


^ - 1 g:  , wc  hav 


Plk‘P],k  + l ' “pl,k  + 1 + 2 2 


(Xlk-X2k'  + ^Ik-’W 


P2k"P2,k  + l =aP2,k+l  4 TT 

m a 

v 


(x2k-x3k)  ' ‘’■lk'^k1 


P3k*p3,  k + 1 ' “p 


3,  k+  1 2 2 

m a 

v 


l(xlk-x3k)  + (x2k'x3k}  (3.4-28c) 


k=0,  1,  . . . , N - 1 


From  (3.4-27)  we  have 


•2X(Uk-Uk+l,+Sl(plk-pl,k  + l,+,,2<p2k-p2,k+  1>  +P3(p3k-p3,k+l)=° 


Using  (3. 4-28)  in  the  above  gives 


-4v"k  + i>  + 8i<aP>,k  + i»  4 ~^Tz  hk-*2k'  4 ^ik-^k’l) 

v 

+ P2(8p2.kkI  + -£r  ['x2k-x3k>  ' ,xlk'x2k,l) 


2 2 
m o 

v 


^(“Ps.k+r— iil-  ['xJk-x3k>  4 (x2k-x3k,l) 


2 2 
m cr 

v 


Also,  putting  k-k -t  1 in  (3.  4-29)  gives 


(3.4-29) 


k + 1 k + 2;  IV  pl,k  + 2 TT  P 1 . k + 1 'XZ.  k + 1 >J ) 

rn  a 

v 


+ (xi,k+rx3.k  + i) 


■>sz(' 


«p. 


Kk 


2^2.  k +2*  m2a2  rz,k  + rx3,k+l)  ' {xl,k+rx2 

V 


.k+l5) 


^K, 


m a 


k+2  2 2 |'“l,k  + rX3,k  + l)  + (x2.k  + l"X3fk  + l,j) 


(3.4-30) 


Subtracting  (3.4-30)  from  (3.4-29)  and  using  (3.4-27) 


-2X!V2uk  4 1 + \ 4 2>  + 2 KX("k  4 1»-Uk  4 2> 


2 2 
m a 

v 


S1  (2(xlk-xl.k  + l)-(x2k-*2,k  + l,-(x 


3k'X3,k4  1* 


2 2 
m a 

V 


Bz  2(x2k'x2,k  + l)_(x 


3k  3,  k + 1 


>-<xik-xi.k  + i>| 


2 2 
m CT 

v 


* 3 [2,X3k-x3.k  + l><xlkxl.  k 4 1>  ^x2k"x2,  k 4 1 ' I 


(3.4-31) 


Using  (3.4-26)  this  can  be  simplified  as 


Ai+\ |<VV/  + <V/V2  + ,VV2i 


4 ^1  [(Xlk‘It2k)  4 '^k-^k'l  4 «B2  |<*2k-X3k»  4 <*»-*, k>l 


+ «fl3  <-'t3k-Xlk)4(x3k-X2k)l  ■=  0 


The  terms  in  (3.  4-32)  are  related  to  (3.  4-29).  Further  simplification 
yields 


j 2 2 

£<Va2,  k ^ IVik^z^Vsk-'Vi.k+i^v 


2 2 

I m O' 

-<a1+a2)  4 — f- 

h 


[2X(uk-(i+cy)uk+1)j  = 0 


(3. 4-33) 


The  last  identity  is  obtained  from  (3.4-27). 

. 2 2 

a 2X  22  , , 2 Of  X m O 

A = ~r  m a u - 2u  , , + u,  J + (u  - u 1 

1 h2  v k k+1  k + 2>  2 lUk+l  Uk  + 2 


A2  = 2„k(^  + p22  3 0*  - «i/?2  . 0z03  . t>30  ) 


(3.4-34) 


(3.4-35) 


So  (3.4-33)  can  be  written  as 
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\ " {lVl  +{2Uk-2  ’ k"1-2 N‘ 


(3. 4-36) 


The  Lagrange  multiplier  X is  finally  obtained  when  the  inputs  satisfy 
N-l 


£ < 


Y . Thus  we  have  the  following  theorem. 


Theorem  3.4.  1 For  the  scalar  system  described  by  (3.  4-17)- 

(3.4-18)  the  optimization  problem  of  (3. 4 - 1 9) - (3.  4-2 1 ) has  the 


solution  U 


= (Uq  ,Uj  UM  , ) satisfying  the  lineal 


difference  equation. 


uk  = + C2uk_2  , k=l,2,...,  N-l  (3.4-37) 


with  inertial  condition  u ^ £ 0 and  u-  0 for  i > 0 . The 
parameters  and  are  functions  of  the  discrete  parameter  values 
/3.,  i=l,  . . . , m and  the  Lagrange  multiplier  X . 

El 

In  Section  (3.  6)  we  present  examples  where  the  optimal 
input  for  multistate  systems  is  approximated  by  difference  equations 
upto  forth  order  using  least  square  fit  to  determine  . 


3.  5 Numerical  Examples 

The  examples  discussed  here  bring  out  the  salient  features  of  the 
signal  design  problem.  We  consider  two  different  examples.  A fourth 
order  state  space  dynamics  and  a single-input  single-output  system. 
Two  examples  are  considered  in  the  latter  case. 


o 


3 • • 1 f ourt)i  Order  System  with  Unknown  I~*a ra meter s in 
$ matrix 

The  following  dynamics  is  considered. 


*k+l  = + *uk  +I>k 


2k  = ± + Vk 


(3.5-1) 

(3.5-2) 


matrixes 

above  hav 

e the 

following 

nu  me 

0 1 

0 

0 

1 

4>  = 

0 0 

1 

0 

1 

0 0 

0 

1 

2 

-.66  .78 

18 

1 

1 

_ 

— 

(3.  5-3) 


T = (1  1 2 I)' 

hT=  [1  0 0 0 ] 


2EQ  ~ G ( 0 , 1 01 ) . v ~G(0,  .25) 

w,  - G(0,  ct2  ) 

K W 


The  number  of  stages  M = 50  and  the  discrete  parameter  values 
m = 3 . The  following  discretizations  are  used 


J 


P 

o 

p 

P 

1 

2 

3 

4 

- . 66 

CO 

-.  18 

1. 

75 

. 85 

3 

1.  19 

24 

. 26 

. 18 

.45 
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The  solution  to  the  estimation  problem  defined  here  consists  of  m 
Kalman  filters  in  parallel.  The  parallel  processing  scheme  is  shown 
in  fig.  (3.5-1).  This  is  similar  to  the  parallel  computation  techniques 
used  by  several  investigators  (see  e.  g.  Alspach[lJ). 

The  optimal  inputs  are  obtained  for  different  values  of  the 

2 2 

plant  noise  covariance  O ; O = 0 , .1,  .25  are  used.  Comparison 

w w 

of  the  estimates  is  made  using  the  Bhattacharyya  distance  and 

2 

divergence,  as  a function  of  the  noise  covariance  a^.  The  B-distance 
is  given  by 


BN  =iuN-'TM  iWi>N  } M UN-' 

ay  8—  ay  { 2 j ay  — 


(3.5.4) 


where  ^ and  E^^  are  as  in  (3.  2-28);  (3.  2-24)  and  (3.  2-9)- 
(3.2-13)  respectively.  The  pairwise  divergence  has  the  form 


jn  = iuN-J 

ay  2 - 


(E-'„  U-1  ) 
\ a N y N J 


M U 
ay  - 


(3.5-5) 


The  optimal  energy  constrained  inputs  are  obtained  numerically  using 
a gradient  technique  outlined  in  Appendix  C. 


Figures  (3.  6-2a)  and  (3.  6-2b)  show  the  optimal  inputs 


for  CJ^  = .1  with  B-distance  and  divergence  criterion  respectively. 

2 

Figures  (3.  6-3a)  and  (3.  6-3b)  are  the  optimal  inputs  for  CT  = 0.  25  . 

w 

2 

Figure  (3.  6-4)  shows  the  optimal  inputs  for  (T  = 0 . The  presence 

w 

of  plant  noise  has  a marked  influence  in  changing  the  signal  shape. 


1 


...... 


The  learning  with  optimal  inputs  is  compared  with  nonoptimal  pulse 
inputs  because  of  its  resemblance  to  the  optimal  inputs. 


The  convergence  of  estimates  of  6 - ('•?  .O  O >0  ) is 

— 12  3 4 

2 2 

compared  in  figs.  (3.  5-5)  and  (3.  5-6)  for  CT  = . 1 and  CT  = . 25 

w w 

respecitvely.  Stagewise  normalized  error  ||£-$  ||^/||0-9  j]  ^ is 

k 0 

plotted  against  the  time  step  k . Comparison  of  the  learning  with  the 

2 

two  signals  as  a function  of  the  noise  covariance  O is  of  importance 

w 

2 

in  this  example.  For  a = . 1 the  results  are  almost  identical.  When 

w 

9 2 
the  dynamic  signal  power  is  increased,  that  is,  when  CT  = . 25  , the 

w 

B-distance  yields  an  improved  performance  compared  to  the 

divergence  (see  fig.  (3.5-6)).  This  is  true  at  least  for  the  case  when 

the  prior  probabilities  are  all  equal.  A similar  conclus ion  was 

obtained  in  an  entirely  different  problem  of  choosing  optimum 

communication  links  in  Gaussian  processes  with  unequal  covariance 

[41].  The  pulse  input  exhibits  less  effective  learning  compared  to  the 

optimal  signals,  thus  showing  that  the  modulation  of  the  signals  as 

obtained  by  approriate  syntheses  has  amarkcd  influence  on  the  learning 

time  and  the  accuracy  of  learning.  The  normalized  estimation  errors 
2 2 

for  CT  = . 1 and  CT  =.25  are  listed  in  table  (3. 5-1). 
w w ' ' 


Table  (3.  5-1) 


Comparison  of  Normalized  Errors 


li  e - & i!2/ 1!  0 . § ii2 

A\  ’ — 0 


Input 


cr2  =.l 


o = .25 

w 


B -distance 


. 512  X 10' 


1.  953  X 10' 


Divergence 


1.012X10' 


2.  914  X 10' 


Pulse  Input 


. 917  X 10' 


10. 679  x lo' 


White  noise 


.654  X 10' 


8. 89  x io'3 


For  Q-.25  the  B-distance  criterion  has  60%  of  the  input  energy 
between  stages  21  and  40,  while  the  divergence  has  43%  of  the  input 
energy.  Thus  the  B-distance  criterion  has  larger  portion  of  the 
input  energy  during  the  second  half  of  the  input  history. 


f 


rs 


L 
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3 • 5 " 3 Single-Input:  Single-Output  Systems 

Here  we  consider  the  following  discrete  form  of  the 
scalar  input-output  system  given  by 


m 


£t  ~ 'y  1 P . x.  . + \ ' y?  u 
k i k-i  ^ i k-i 

i=l  i=l 


(3.5-6) 


2k  = xk  + \ > k=l,  2 N 


(3.5-7) 


v ~ G(0,  o ) 

K V 


(3.5-8) 


Two  examples  are  considered. 
Example  (1)  Second  Order  System: 


Xk^lXk-lf^V2^\.l 


(3.5-9) 


_§  = (Pj»  <P2-  8 ) 

The  actual  values  of  the  parameters  are  P = ] 5 p = 

1 ' 2 

This  system  has  poles  at.  . 75  ± j.  37  in  the  Z-plane. 
Example  (ii)  First  Order  System: 


-.7,  iff  = 1. 


Xk^Xk-l  +5Vi 


(3.  5-10) 


1 = (P  , |8  ) 
with  p = 7,  6 = 1. 


Total  number  of  stages  N = 20  and  m = 3 . The  discrete  parameter 
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i 


values  are  taken  as  follows: 


°1  P2  * 

1.5  -.7  1. 

1.1  -.4  .6 

1.6  -.8  1.1 

Figures  (3.5-7)  and  (3.5-8)  show  the  optimal  inputs  for  second  order 
and  first  order  systems  respectively.  Difference  equation 
approximation  of  the  optimal  input  sequences  is  obtained  for  each  of 
the  above  two  cases.  Figures  (3.  5-9)  and  (3.  5-10)  show  respectively 
these  results  for  the  two  dynamics.  The  difference  equation  has  the 
form 

s 

Uk=E  hVi  (3.5-11) 

i- 1 

Second,  third  and  fourth  order  fits  are  displayed  in  the  above  figures. 
Tables  (3.5-2)  and  (3.5-3)  show  the  comparison  of  the  cost  functions 
for  each  of  the  above  systems  and  the  corresponding  parameter  values 


1.  54994 


1. 29982 


76833 


31829 


34751 


Table  (3.5-2) 

Second  Order  System-Comparison  of  Di 

fference  Fouation 

Approximation  to  the  Optimal  Inputs 

Second  Order 

Third  Order 

Fourth  Order 

Opt  imal 

Fit 

Fit 

Fit 

Input 

Cost  Function  72.3134 

78. 1179 

78. 7619 

80. 2267 

Tabic  (3.5-3) 


First  Order  System 


Comparison  of  Difference  Equation  Approximation 


to  the  Optimal  Inputs 


Second  Order  Third  Order  Fourth  Order  Optimal 
Fit  Fit  Fit  Input 


Cost  Function 


19. 2525 


19. 6215 


19. 863 


1. 27082 


1. 21546 


1.  18881 


The  a posteriori  probabilities  P(6^|Z  ) are  compared  in 

Table  (3.  6-2)  for  the  second  order  system  with  optimal  input  and  the 
inputs  from  second,  third  and  fourth  order  fits.  A significant 
difference  is  seen  from  the  second  order  to  the  third  order  fit. 
Thereafter  the  improvement  in  learning  is  gradual  as  is  the  increase 
in  the  respective  cost  functions. 

A comparison  of  the  learning  performances  v'ith  optimal 
inputs  and  nonoptimal  exponential  inputs  of  equal  energy,  exhibits  the 


superior  learning  with  optimal  inputs.  Growing  exponential  inputs  are 


used  1 ) y I.itman  and  Huggins  [55]  in  identifying  transfer  function 
models.  In  this  example  both  the  transition  parameters 
the  gain  parameter  are  treated  as  unknown  quantities.  Table 
(3.  5-6)  compares  the  learnings  with  optimal  input  and  exponential 
input. 

Table  (3.  5-4) 

Comparison  of  the  Learning  Perfo r ma nc.es 


for  the  Second  Order  Svstem 


1mnII2/||(M0II2 

Optimal  Input 

5. 7546  X 10'4 

5. 3843  X 10'4 

Exponential  Input 

4.  8278  X 10‘2 

4.  8449  X lCf  2 

I he  superior  learning  with  optimal  inputs  is  clearly  demonstrated. 
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FIGURE  3.  5 -2a 

Optimal  input  with  Bhattacharyya  distance  criterion; 
2 

Example  1,  a =.  1 . 

w 


1 


I 

i 

FIGURE  3.  5-2b 


Optimal  input  with  divergence  criterion;  Example  1, 


FIGURE  3.5-9 


Approximation  of  optimal  inputs  for  second  order  system 
by  difference  equations:  a)  second  order  fit;  b)  third 
order  fit;  c)  fourth  order  fit. 

Optimal  input 


Difference  equation  approximation 


r 


” 11  1 1 ■ 111,1  — — 


16$ 


o 


ll 


FIGURE  3.5-10 

Approximation  of  optimal  inputs  for  the  first  order 
system  by  difference  equations:  a)  second  order  fit; 
b)  third  order  fit;  c)  fourth  order  fit. 

□ Optimal  input 

O Difference  equation  approximation 
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CHAPTER  IV 


SYNTHESIS  OE  STOCHASTIC  INPUTS 


GENERATED  BY  LINEAR  MODELS 


In  this  chapter  an  alternate  approach  to  the  signal  design 
problem  is  presented.  Here  linear  stochastic  processes  are  used 
to  generate  inputs  for  identification  problems.  Such  signals  can  be 
treated  as  linearly'  filtered  white  noise.  A degenerate  case  of  this 
class  of  signals  is  a pure  white  noise  sequence.  Considerable 
improvement  in  learning  can  be  achieved  by  using  inputs  that  are 
correlated.  The  stochastic  signals  in  this  chapter  are  generated 
from  the  linear  stochastic  processes  of  the  mixed  autoregressive 
moving  average  type  (ARMA(p.q)  ) given  by 


P q 

u.  =r«.u,  . + v - v*  f . v . 

k /—j  i k - i k 1—4  D k - 1 

i- 1 i=l 


(4.  1) 


where  } is  a white  noise  sequence.  The  signals  thus  generated 
are  independent  of  the  observation  noise  and  the  resulting  problem 
gives  rise  to  a reduced  optimization  in  the  space  of  parameters 
( ) ensuring  stationarity  and  invertibility  of  the  processes. 

Choice  of  random  inputs  has  been  studied  by  Box  and  Jenkins 
[15,  pp,  4 16-420].  Here  a specific  example  is  considered  and  a 
first  order  autoregressive  model  is  used  to  generate  the  input 
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sequence.  Cumming  [22]  used  a synchronous  random  telegraph 
signal  in  studying  the  bandwidth  effects  on  the  identification  accuracy. 
Simulation  studies  showed  that  an  input  signal  bandwidth  of  about  one 
half  that  of  the  system  gave  the  best  results. 

Here  we  are  interested  in  generating  random  inputs  from 
linear  processes  and  comparing  the  performance  of  these  input 
signals.  Instead  of  merely  choosing  the  variance  of  the  white  noise, 
the  technique  developed  in  this  chapter  gives  us  the  additional  degrees 
of  freedom  of  choosing  the  input  parameters  via  the  optimality 
criterion.  For  each  type  of  system  various  inputs  are  tried  and  the 
one  that  gives  the  maximum  cost  is  selected  as  the  optimum  one. 

No  bandwidth  considerations  are  included  in  the  analysis. 

We  use  the  sensitivity  index  of  the  observations  as  the  cost 
function.  The  quadratic  nature  of  this  function  is  exploited  in  deriving 
the  property  of  optimal  signals.  It  is  also  shown  that  these  sensi- 
tivities are  related  to  the  Bhattacha ryya  distance. 

In  section  (4.  1)  a brief  discussion  of  the  stationary  processes 

is  given  and  the  optimality  criterion  for  the  single-input  single-output 

system  is  derived.  In  section  (4,2)  the  properties  of  these  inputs 

as  applied  to  linear  systems  are  studied  via  the  spectral  density, 

s (X)  of  these  signals.  The  input  sequence  u , k = 0,  1,  2,  ... 
uu  k 

is  asymptotically  stationary  under  certain  constraints  on  • 
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By  writing  the  correlation  matrix  in  the  form 

C =0?'MMr  (4.2) 

uu  V 

and  using  the  results  on  asymptotic  eigenvalue  distribution  of  Toeplitz 
matrices  it  is  shown  that 

TT 

lim  Trace  [MM^]  = — — I s (X)dX  (4.3) 

-T  ^ N L 2 I uu 

N-*00  2tra  J „ 

V 

2 

With  constraints  on  cr^  it  is  established  that  at  the  optimum  value  of 
the  cost  function  the  corresponding  effective  spectral  area  given  by 
the  above  relationship  also  has  the  largest  value.  In  section  (4.  3) 
examples  are  presented  comparing  the  performances  of  stochastic 
inputs  obtained  by  several  different  processes.  Numerical  results 
show  that  the  inputs  obtained  from  the  autoregressive  and  ARMA(1,  1) 
processes  give  better  learning  characteristics  than  the  moving 
average  process.  It  is  also  observed  that  if  the  parameter  space  is 
taken  to  be  small  surrounding  the  actual  parameter  values  then  the 
power  of  the  input  process  is  concentrated  in  the  region  where  the 
gain  of  the  system  is  high. 

4.  1 Input  Signals  Generated  by  Linear  Stochastic  Processes 

I. incar  stochastic  models  arc  briefly  discussed  giving  their 
salient  features.  The  system  dynamics  is  presented  in  terms  of 


single-input  single-output  (SISO)  form.  The  cost  function  is 
calculated  using  the  sensitivity  index  which  is  quadratic  in  the  input 
variables. 


4.1.1  Description  of  Linear  Stochastic  Models 

A general  linear  stochastic  model  supposes  that  a time  series 
is  generated  by  a linear  aggregation  of  random  noise.  Such  a process 
can  be  represented  by  the  autoregressive  moving  average  model 


1 u 

u,  = Y'  0!  u,  . + [/  - V*  £ V 

h ' 1 k-i  k ‘ ’i  k-i 


(4.  1-1) 


Using  the  backward  shift  operator 


\ = V. 


(4.  1-2) 


(4.  1-1)  can  be  written  as 


(1  (>  -£?V] 

• i *■ 


(4.  1-3) 


This  can  be  further  written  as 


uk=  (1  -£V’’V' (l  - £ 

i-1  i=l 


(4.  1-4) 


The  convergence  of  the  scries  *!>(B)  = (1  - £ a.B1)-1  ensures  that 

i=l 

the  process  has  a finite  variance.  This  is  ensured  by  the  condition 


on  or  that  the  roots  of  the  equation 


must  be  outside  the  unit  circle.  Thus  when  the  noise 


stationary,  the  above  condition  ensures  that  the  process  u is  also 


stationary.  The  condition  of  invertibility  is  concerned  with  recoverin 


V from  the  present  and  past  happenings.  Writin 


the  linear  process  is  invertible  if  the  infinite  series  expansion  of 


converges.  This  requires  that  the  roots  of 


the  equation 


lie  outside  the  unit  circle.  The  conditions  for  invertibility  are  inde 


pendent  of  the  conditions  for  stationarity  of  the  time  scries.  Thus 


we  see  that  the  conditions  imposed  by  the  roots  of  (4.  1-5)  and  (4.  1-7) 


restrict  the  values  taken  by  Cl  and  . In  the  followin 


we  briefly 


s these  processes.  For  a complete  discussion  see  Box  and 


discu 


The  spectrum  of  the  process  is 

s (f)  =0*  | 1 - £ e'i2Trf-  t e'i27Tqf  I2  |f  hi  (4.1-15) 

mi  V 1 q 


1 


i 


r\ 


Mixed  Autoregressive  Moving  Average  Process  (ARMA(p,q)) 
This  process  has  the  characteristics  of  both  AR(p)  and  MA(q) 
processes  and  is  the  general  form  of  the  stationary  time  series. 
This  has  the  form 


p q 

u.  = V Ot.u,  . + V - V i.v  . 
k 1 ' l k-i  k £—j  i k-i 

i=l  i=l 

The  correlation  function  is  given  by 


(4. 1-16) 


C 


P = a,  P . +a  0 _ + . . . +a  p . t * q + 1 (4.  1-17) 

t 1 t-1  2 t-2  p t-p 


There  arc  q autocorrelations  P , P P,  whose  values 

q q-1  1 

depend  directly  on  the  choice  of  the  q moving  average  parameters 
£ and  on  the  p autoregressive  parameters  a . The  spectrum  of 
the  mixed  process  is  given  by 


L i «L. 


o 

I . -i2irf  - i4 tt£  - i 2 tt c i f i 

> (f)  - o* — ^ " 

uu  l>  ~i 

I . -i2iTf  - i 4 tt f -i27iqf  ( ^ 

I 1 - a.c  -Or  c - ...  - a e 1 | 

12  p 


f I £ k 


In  the  next  section  we  derive  the  optimality  criterion  for  systems 
represented  by  input -output  dynamics. 


4.  I.  2 Sensitivity  Index  and  Optimality  Criterion 

We  define  the  system  equations  in  terms  of  the  single-input 
single-output  dynamics.  This  representation  facilitates  the  easy 
computation  of  sensitivity  index  and  comparison  of  the  results  ob- 
tained in  this  chapter  with  those  of  earlier  investigators.  (This  class 
of  linear  models  is  also  used  by  Box  and  Jenkins  [15,  Ch.  J 1]  in 
time  series  analysis.  ) The  quadratic  nature  of  the  cost  function 
is  exploited  in  deriving  the  properties  of  the  optimal  inputs. 

Consider  the  following  dynamics: 


y,  = P.  y . + y''  P.u  . + 

7k  *—•  l'k-i  l k-i 


n-°-  ui.i  = ° ¥JS° 


wR~  G(Op  0k)  , k = 1,  2 N 
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{ Uj_  } is  a zero  mean  random  sequence  uncorrelated  with  (w  } 

I V 


E { u^  w£  } = 0 ^ k,  l 


Define  the  following  NXN  matrices 


B = 
N 


'n-E  V* 

i- 1 


pi  'n+E 

i-Z 


<Vn 


r > 1 


r = 1 


(4. 1-24) 


S is  the  NXN  shift  matrix  with  elements  s =6  . The  vector 

ij  i-l.j 

N A t 

)L  ~ 'Y  y Y 2’  ’ • ' » yN)  can  he  v/ritten  as 


*N-nVN-,**n*N 


(4.  1 -25) 


lor  further  analysis  of  the  stochastic  input  problem  we  derive  the 
sensitivity  index  of  the  observations  with  respect  to  the  parameters 
_ (.£?  » H ) ^ R and  show  that  this  index  is  closely  related  to  the 

B-distance.  The  quadratic  form  of  the  sensitivities  is  useful  in 
obtaining  the  spectral  properties  of  the  input  sequence. 

The  sensitivity  of  the  output  y^  with  respect  to  0_  = (p,  P ) is 
given  by 


BO  =(yk-l yk-n’  Uk-1 Uk-r) 


(4. 1-26) 
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and  the  squared  index  is  given  by 


* T , 

dy.  oy  n r 

yk  yk  y 2 y>  2 

~ 'E  vk-i  * E l,k-i 


k 50  50 


i=  1 


i=  1 


The  average  cost  function  can  be  written  as 


( N 5 y 3 y ) 

J = E E £ kl 

' k=l  ~ 


50  j 


The  quantities  € , k = 1 , . . . , N can  be  computed  as  ; 

r 

form  in  the  input  sequence  tu  Jj_-o  as  follows. 


€i  =uo 


, _ 2 2 2 

C = V T U T U 

2 yl  1 0 


C = y?  + y2  + 

r yr-l  yr-2 


2 a 2 

+ yi  +Ur-1  +- 


• + VI 


2 2 

Cr  + 1 =yr  +yr-l  + 


2 2 

+ yi  +Ur  +* 


. + u. 


2 2 

c = y , + y , + • • 

n n- I n-2 


2 2 2 2 

+ y +u  + u +...+U 
1 n- 1 n-2  n-  r 


(4. 1-27) 


(4. 1-28) 

quadratic 


CN  ~ yN-l  + yN-2  + ' * 


2 , 2 
+ yN-n  +UN-1  + 


. 4 U 


N-r 


In  the  abo\  e we  have  assumed  n ^ r . So  J can  be  written 


as 


N ,p  >-p 

E,  N N N- 1 N-l 

ek  = X.  Wj  y_  + U W„  u‘N 


k-  1 


2 - 


(4. 1-29) 


The  matrices  Wj  and  W are  given  by 


W 


■ i = diag[  n,  n,  . . . n,  n-  1,  n-2,  ....  1 , 0 ] 


"Y“ 

n 


(4. 1-30) 


W = diag  [ r,  r r, 

v y ' 

N-r  + 1 


r - 1 , r - 2 , 
V 


r - 1 


(4.  1-31) 


Averaging  over  all  the  random  variables 


J = Ey,u[J]  (4.1-32) 

Using  (d.  1-25) 


J ■ '*nVn',+*nV)t 


N N 

+ uN-lTw?uN-' 


(4. 1-33) 


O 
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J = Efu1  ' 1 i Ef  ij  ~ 1 w?y  ) 


+ E [ WN  ^NJ  Wj  WNJ  (4.  1 -34) 


The  cross  terms  are  zero  since  [uk  ) and  (w  } are  independent, 
zero  mean,  random  sequences.  The  last  term  in  (4.  1-34)  is  inde- 
pendent of  the  input  variables  and  we  can  write 


J = Eu[yN-1  qu*-1] 


(4.  1-35) 


where 


° = Wn  wi*n1jW 


(4. 1-36) 


When  the  parameter  set  takes  on  finite  number  of  discrete  values 


we  have 


= E [UN_1  Ch  UN ~ 1 } 


(4.  1-37) 


and  the  total  cost  function  can  he  written  as 


m 

cE  E Js 

• • 1 


(4.  1-38) 


Eet  us  now  derive  the  B-distance  for  this  case.  The  pairwise 
distance  is  given  by 


c 


Lj  . . 

u 


>L 


£ . clet 
1 


J 


where  and  £ are  the  covariances  of 


N 


and 


N 

*j 


B £ in  — 
ij  N 


(4. 


Now 


1 

2N 


N In  — T race  ( 2)  . + £ . ) 


(4.  1 


The  covariance  matrices  L . are  given  by 

y _ „r  N N 1 , 

Si  ~ EUj  JLj  ] 


= E 


U>;T^  wN)l*NTe„  un-,  + 4>n1 

1 1 i i 


(4.  1 


1-39) 


-40) 


T 

} 


41) 


Since  the  first  term  in  (4.  1-41)  is  not  a function  of  {u  } 


we  have 
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The  matrices  in  (4.  1-42)  and  (4.  1-43)  differ  in  the  following  way  - 
the  weighting  matrix  Wj  is  a function  of  the  order  n of  the  system. 
The  quadratic  matrix  has  its  diagonal  elements  increased  by  the 
addition  of  the  diagonal  matrix  2 W , where  W is  a function  of 
the  order  of  the  moving  average  part  of  the  dynamics.  Within  these 
limits  the  sensitivity  cost  index  has  the  interpretation  given  by  the 
upper  bound  ( 1-40). 

Assuming  that  the  input  sequences  [u,  } is  stationary  the 

k k 

cost  function  can  be  written  as 


N-l 
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where  P,  ~ y ; / , k = 0,  1,  N-l  and  y is  the  variance  of 

k k 0 0 

the  inputs.  Assume  that  the  inputs  are  constrained  to  have  finite 
energy  ^ such  that 


y £ Y 
0 


(4. 1-45) 


The  p are  sums  of  diagonal  elements  of  Q and  matrix  given  by 

K 


N-k 

‘k  = E qi.i+k’  k = °'  1 N- 1 

i-l 


(4.  1-46) 


The  optimization  problem  is  that  of  maximizing  J with  respect  to 
parameters  of  the  input  process  - (Q^,  £_)  ' R^  * ^ . The  constraint 


region  for  » 4.)  is  such  that  these  parameters  must  satisfy  con- 
ditions for  stationarity  and  invertibility  of  the  given  input  process. 
For  a given  total  input  energy  the  optimization  problem  can  be 

written  as 


J = 


sup  ]T 

(a,  £_)  € £ k^0 


(4.  1-47) 


where  £ is  the  admissible  region  in  R ^ for  the  input  parameters. 
The  constraint  regions  for  input  processes  with  two  parameters  is 
given  below: 


AR(2)  Inputs 


uk=  V’k-1  +V‘k-2  * V "•  ■' 


(4. 1-48) 


MO 


141 


o 


When  [u  } arc  stationary  with  finite  energy  J is  maximized  by 
any  stationary  input.  In  this  particular  case  a white  noise  process 
will  perform  as  well  as  any  other  stationary  input  of  equal  energy. 


. i 


4 . 2 Characterization  of  Stochn  stic  Inputs 

The  stochastic  inputs  arc  derived  from  the  above  processes 
by  considering  u for  k ^ 0 . These  input  processes  arc  asymptoti- 

K. 

cally  stationary  when  the  input  process  parameters  satisfy  the 
conditions  discussed  in  section  (4.  1).  The  properties  of  input 
signals  as  applied  to  linear  dynamical  systems  are  studied  by 
considering  the  spectral  density  of  the  stationary  processes.  Using 
the  results  on  asymptotic  eigenvalue  distribution  of  Toeplitz  matrices 
the  input  correlation  matrix  can  be  approximated  by  the  corres- 
ponding spectral  density  of  the  stationary  process  in  the  sense  that 


1 ( T I 

lim  — Trace  E [ U U ] 
N-«  N ( L J) 


I Vf 


s (X)  dX 
uu 


(4.2-1) 


Furthermore,  if  we  fix  the  variance  of  the  input  white  noise,  C?  , 
by  writing  the  correlation  matrix  in  the  form 


2 T 
C =a  MM 
uu  V 


(4.  2-2) 


it  follows  that 


rr  Trace  [MM  ] = r / s (X)  dX 

' 2 / uu 

2ttcx  *■  -ir 


N 


lim 

N-* 


_ 2 
2vav 


(4.2-3) 
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Thus  for  this  case  at  the  maximum  value  of  J the  input  process 
is  such  that  it  has  the  largest  value  for  t lie  effective  spectral  area 
in  the  sense  of  (4.2-3). 

The  results  on  Toeplitz  asymptotic  theorems  used  here  have 
been  applied  by  Gray  [31]  in  deriving  information  rates  of  ARMA 
processes.  This  was  the  first  time  that  such  theory  was  applied 
in  engineering  problems.  Even  though  the  results  of  the  following 
sections  related  to  the  Toeplitz  matrices  are  not  new,  the  application 
of  such  theory  in  the  present  context  has  enabled  us  to  obtain  new 
insight  into  the  behavior  of  random  input  signals.  Such  a charac- 
terization has  many  attractive  features  for  theoretical  as  well  as 
numerical  studies.  We  will  use  some  basic  theorems  on  asymptotic 
eigenvalue  distribution  of  Toeplitz  matrices  credited  to  Grenander 
and  Szego  [33,  Ch.  5],  and  some  results  of  Gray  [31]. 


4.  2.  1 The  Cost  Function 


The  optimality  criterion  from  (4.  1-35)  is 


•1  = Ej/'1  Q UN~  1 ] 


= Tr 


qe/-1/-1  ) 


(4.2-4) 
(4.  2-5) 


J has  the  following  upper  bound 

. T 


Tr 


QEumN"1yN"1  ) 


T 


5 X (Q)  Tr  E f IJN~  ’ IJN*  1 ] (4.2-6) 

max  U — — 


o 
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Thus  for  this  case  at  the  maximum  value  of  J the  input  process 
is  such  that  it  has  the  largest  value  for  the  effective  spectral  area 
in  the  sense  of  (4.2-3). 

The  results  on  Toeplitz  asymptotic  theorems  used  here  have 
been  applied  by  Gray  [31]  in  deriving  information  rates  of  ARMA 
processes.  This  was  the  first  time  that  such  theory  was  applied 
in  engineering  problems.  Even  though  the  results  of  the  following 
sections  related  to  the  Toeplitz  matrices  are  not  new,  the  application 
of  such  theory  in  the  present  context  has  enabled  us  to  obtain  new 
insight  into  the  behavior  of  random  input  signals.  Such  a charac- 
terization has  many  attractive  features  for  theoretical  as  well  as 
numerical  studies.  We  will  use  some  basic  theorems  on  asymptotic 
eigenvalue  distribution  of  Toeplitz  matrices  credited  to  Grenander 
and  Szego  [33,  Ch.  5],  and  some  results  of  Gray  [31]. 


4.  2.  1 The  Cost  Function 


The  optimality  criterion  from  (4.  1-35)  is 


QUN_1] 


= Tr  qe u(uN~ 1 uN" 1 ) 


(4.  2-4) 


(4.  2-5) 


J has  the  following  upper  bound 


Tr  f Q E (UN~  1 LJN~  1 )]  * X (Q)  Tr  E [ UN"  1 IJN " 1 ] (4.2-6) 

U — — max  u — — 


f 


The  optimal  value  of  J is 


If  the  inputs  arc  stationary  with  finite  energy  then 


ma  x 


where  c is  a constant.  We  will  show  that  when  the  inputs  are 


asymptotically  stationary  J has  the  optimum  value  in  the  sense 


max 


where 


is  the  input  energy  constraint.  When  the  input  noise 


We  will  show  that  J attains  its  optimum  value  asymptotically 


when  the  following  relationship  is  satisfied 


TrfMM  ] 


(X)  d X 


Thus  at  the  maximum  value  of  J the  corresponding  input  process 


also  has  the  largest  spectral  area  in  the  sense  of  (4.2-11).  When 


the  condition  of  stationarity  is  satisfied  the  cost  function  can  also 
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be  written  as 


= Y]  k y 

n n 


(4.  2-12) 


The  correlations  7^  are  given  by 


\-h\ 

J - TT 


s (X) c d X n ~ 0,  1 , . . 
uu 


(4.  2-13) 


Since  s (X)  is  symmetric,  J can  be  written  as 
uu 


" N-i 

= v / (X  ^n  C°S  nX)  Suu(X,dX 
-/0  ' n-0  ' 


(4.  2-14) 


This  can  be  interpreted  as  the  area  under  the  product  curve  of 
N-  1 

s (X)  and  X (x  cos  nX  . Since  the  first  term  in  the  parentheses 

uu  7—/  n 

n=0  _ 

in  (4.  2-10)  is  fixed  the  maximum  value  of  J is  a function  of  s (X)  , 

uu 

the  spectral  density  which  must  be  shaped  so  as  to  maximize  the 

/N-l  x 


area  of  the  product  curve 


, IN  - 1 

(Z  *„ 

' ^ -n 


cos  nX  1 s (X)  . In  the  next 
/ uu 


few  sections  we  discuss  these  aspects  with  reference  to  the  finite 
order  linear  inputs. 


4.  2.  2 The  Autoregressive  AR(p)  Inputs 

Consider  the  finite  order  autoregressive  process 


u,  = Y'  a.  u,  . + k,  , k ;>  0 
k i k-i  k 

i- 1 


uk  * 0 


k < 0 


(4.  2-15) 


{ V } is  a zero-mean  white  noise  sequence  with  finite  variance  Q 
^ k v 


We  can  write  (-1.2- 15)  as 


N-l  N-l 

V - A U 


where 


■ a 


-O', 


ttl  ! 


-a  -a 

P*.  P-1 


0 -a  -a 

P P- 1 


Rewriting  (4.2-16)  as 


N-l  -1  N-l 
U = A V 


The  correlation  matrix  becomes 


2 T - 1 
C = trf  t A1  A) 

uii  V 


T 


N X N 


matrix 


(4. 2-16) 


(4. 2-17) 


(4. 2-18) 


(4.  2-19) 


The  matrix  A A is  shown  in  Fig.  (4.2-1).  Instead  of  using  the 


'f  - 1 . T 

matrix  (A  A)  consider  its  inverse  A. , = A A . The  elements 

J\ 


of  matrix  A A are  given  by 
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N-max(i,  j) 

AN(i’j)  = Z 

k--o 


rJ  r,  i . . , U = \ 

k kt  | i-j  0 


ii  J — 1 , 2 , ■ ■ • i N 


(4. 2-20) 


Since  0:^  = 0 for  k>  p , all  entries  more  than  p diagonals  away 

from  the  main  diagonal  of  A are  zero.  For  large  N all  entries 

of  A^  except  those  in  the  p x p submatrix  in  the  lower  right  hand 

corner  depend  only  on  | i-j  | , Hence  the  inverse  of  C represented 

uu 

by  A closely  resembles  a Toeplitz  matrix.  This  is  the  key  point 


to  relating  the  trace  of  to  the  integral 


if 

2T7  I 

•/_  TT 


s (X)  dX  . 

uu 


Define  the  N X N symmetric  Toeplitz  matrix  T as 


tn  = t 


«*)]  = “kj1  = [t|  m , 1 = £'W,|k 


i |k  - j | x 


dx  (4.2-21) 


f(x)  is  a real-valued,  continuous,  bounded  function  on  [-7:,  tt]  such 


j-r 

2 IT  I 

•'  tt 


dx^n  f(x)  > 


(4. 2-22) 


so  that  f(x)  > 0 almost  everywhere.  Let  us  refer  to  this  class  of 
function  as  L,  . 


Definition  4,  2.  1 [33,  p.  62].  Consider  the  nonnegative  sequences 

{a,  } and  [b,  } . Assume  that  for  each  k and  N 

k,  N k,  N 


ak.N<R'  bk,N<U 


(4.  2-23) 
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where  R is  independent  of  both  k and  N . The  sets  {a  } and 

k,  N 

(hk  N ^ arc  Said  t0  bc>  asymPt0tically  equally  distributed  as  N - 03 


lim  ^Z  *'<a  > - F(b  ) = 0 

N-.CO  k_-.j  K> 

where  F(t)  is  an  arbitrary  continuous  function  on  [0,  It] 


(4. 2-24) 


Definition  4.2.  2 Let  bean  N X N Hermitian  matrix  with 

entries  and  eigenvalues  ^ , k=  1,  2,  ....  N.  The 

strong  norm  of  L is 


LNil  = maX  |Xk(N 

K 


(4.  2-25) 


The  weak  norm  of  L is 

N 


lb,'  = iwl2f 


i = l i---l 


£ E i\ 


(4. 2-26) 


Definition  (4.  2.  3)  \Vc  say  that  two  sequences  of  Hermitian  matrices 

^ and  (G„J  exhibit  mutual  approximation,  denoted  by  L ~G 
in  in  -*  N N 

if  they  possess  the  following  properties. 


'V'  Hgn1l  'V-  »cni 


(4.  2-27) 
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*im  1 ln  ' gn 

I'D*  " 


(4. 2-28) 


Now  we  state  the  fundamental  theorem  on  the  asymptotic  eigenvalue 
distribution  of  Tcoplitz  matrices  (See  [33,  theorem  5.2,  p.  64]). 


Theorem  4.2.  1 (Asymptotic  Eigenvalue  Distribution  of  Toeplitz 
Matrices). 

Let  f(x)  be  a real-valued  function  of  class  L . 

Denote  by  m and  M the  essential  lower  and  upper  bounds 
of  f(x)  respectively,  and  assume  that  m and  M are  finite.  Let 
F(X)  be  any  continuous  function  defined  in  the  finite  interval 
m ^ X ^ M . Let  X^  ^ , k = 1 , 2,  . . . , N be  the  eigenvalues  of 
the  corresponding  Toeplitz  matrix.  Then 


n”  «&***■»"  rK 


(4.2-29) 


The  next  theorem  is  the  basic  theorem  for  finding  the 
asymptotic  eigenvalue  distribution  of  Hcrmitian  matrices  approxi- 
mated by  Toeplitz  forms.  We  state  this  theorem  as  given  by  Gray 

[32]. 


Theorem  4.2.2  Given  a Toeplitz.  matrix  L and  a Ilermitian 
mala*  CN  such  that  LN  ~ CN  . Let  (“k  N )^=]  and  tPk_  N )" 


rm 
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be  the  eigenvalues  of  I.  and  Cr  , respectively.  Since  L and 

N N N 

G^  are  bounded,  there  exist  finite  numbers  ni  and  M such  that 


m5“k,K’  Pk,NSM’  k=1'2 N 


(4. 2-30) 


Let  F(x)  be  an  arbitrary  continuous  function  on  [m,  M]  . Then 
N N 

l l 

lim  — Ffo;  ] = lim  — Ff  3 1 

N ' L k.  NJ  N l-  nj 


N—  n a k’N  n~*“  n 


if  cither  of  the  limits  exist. 


(4. 2-31) 
K 


Theorems  (4,  2.  1)  and  (4.2.2)  are  applied  to  the  matrix  A A 

to  obtain  the  desired  result.  Let  us  identify  the  N X N matrix 

T 

A A , which  is  Hermitian  and  positive  definite,  as  G . Let 

hi 


°N  = aTa 


Define  the  N X N Toeplitz  matrix  L^  as 


(4. 2-32) 


ln  = T 


f(x) 


=E  vwii-ji  (4-2-33) 


k=0 


with  elements  beyond  the  ptli  diagonal  equal  to  zero.  L^.  and  G t 
agree  element  by  element  except  in  a p K p submatrix  at  the  lower 
right  corner.  It  is  shown  by  Gray  [31]  that  the  Hermitian  matrix 
G^  is  approximated  by  the  Toeplitz  matrix  L in  the  sense  that 


ljm  I " CV  I 0 
N—  N K 


(4.  2-3  1) 


is  locplitz  it  follows  from  theorem  [A.  2.  1) 


Since  the  matrix  1. 


N 


that  the  eigenvalues  j of  are  distributed  asymptotica 


as  f(\)  where 


Further 


f(X)  = 1 1 Qk^"ikX|2 

k-j 


-[£/_ 


lly 


(4.2-35) 


- TT 


< co 


(4. 2-36) 


s inc  e 


° < f<x) s (]C  i«k  1 1 < « 


' k=0 


(4.  2-37) 


Hence  from  (4.2-34)  and  (4.2-35)  the  conditions  of  theorem  (4.2.2) 
are  satisfied  giving  the  result  that 

N N 


k,  N' 


(4.  2-38) 


lim  Sf  S F(\  N)  = lim  j,  F(P 
N-“  k=l  ’ N-»  N kM 

N 

where  fPk>N}k  = ] are  the  eigenvalues  of  = aTA  . Thus  theorem 
(4.  2.  1)  and  (4.  2.  38)  yield) 

N 

> . 1 ( _ I ....  I . 

(4.  2-39) 


nE^n'-s/  FH"X 


Note  that  Pk  N > 0 V k . and  letting 


’ — 1 wimiwii 


■ | 


wc  finallv  obtain 


. N 

lim  i V (T1  = — f 

N-™  N ff,  k’K  2*J_„  «x> 


d A 


w bore 


(4.  2-41) 
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3 


^>  = i'-E  <v'iUi2,  w 


(4.  2-42) 


k=l 


But  for  the  autoregressive  process  AR(p)  the  spectrum  is  given  by 

->  P 

2 / I 1 - “ 

k=  1 


■UI1ft>-o;/ii-£v-"»i2 


, A e [-tt, 


(4.  2-43) 


Hence  we  have 


/IT 

s (A)  d A 
uu 


(4.  2-44) 


We  have  the  following  theorern  for  autore 


gressive  inputs. 


.Theorem  4,  2.  3 F or  the  autoregressive  inputs 

P 

u = V tt.u  . + V 

k * j i lv  - i k 

i=l 

the  trace  of  the  input  correlation  matrix  is  related  to  the  spectrum 
of  the  corresponding  stationary  process,  s (A)  in  the  sense 


lim  ~ T r [A  J A] ' 1 

|\J  -*  OO 


2uO 


/: 


uu 


s (A)  dA 
uu 


(4.  2-55) 
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L 


where  the  spectrum  of  the  stationary  AR(p)  inputs  is 

Suu(X)  akd"lU^’  X€ 

k-  1 


(4. 2-56) 
E 


For  a given  value  of  CT^  , the  variance  of  white  noise  generat- 
ing the  inputs,  we  have  from  (4.  2-  10)  that  at  the  maximum  value  of 
J the  corresponding  effective  spectral  area  of  the  input  spectrum 
also  has  the  largest  value.  That  is 


/■nr 

7777  dX 
f(A) 


for  large  N which  follows  from  (4.2-55). 

4.  2.  3 The  Moving  Average  MA(cj)  Inputs 
This  class  of  inputs  has  the  form 

q 


u ~ V 
k k 


. V £ V 
< *i  k-i 

i = l 


For  N stages  this  can  be  written  as 


N-l  _ N - 1 
U = - V 


where 


-4 


q*  • 


•-6, 


N X N 
matrix 


(4.  2-57) 


(4.  2 - 5 S ) 


(4. 2-59) 


(4.  2-60) 
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The  correlation  matrix  is  given  try 


E [ U U1  ] ^cr?'  E E T 

i J v 


T 


(4.  2-61) 


The  eigenvalues  of  are  the  same  as  that  of  n which  has 

T 

the  same  form  as  A A with  substituted  for  ry  up  to  p = q in 

__  t __ 

Figure  (4.  2-1).  Thus  the  matrix  — ■=■  can  be  approximated  by  a 

Toeplitz  matrix  as  shown  in  section  (4.2-2)  where 


f(X) 


1 


-E«, 


- ikX  i 2 


, X £ [-ir,  tt] 


k=  1 


If  (Xj_  ^ } are  the  eigenvalues  of  — H we  have 


lirn 

N~* 00 


N 

k-1  J-v 


The  following  theorem  now  follows: 


Theorem  4.2.4  For  the  finite  moving  average  inputs 


(4.  2-62) 


(4.  2-63) 


u,  = V - > tv 
k k ^ i k-i 

i-1 


(4.  2-64) 


The  trace  of  the  input  cor  relation  matrix  is  related  to  the  spectrum 
of  the  corresponding  stationary  process,  s (X)  in  the  sense 


uu 


1 im 
N-* » 


ir 

j;T,pHTl=_L  | »uuW<U  (4.2-65) 

2ttct  J 


V -IT 

where  the  spectrum  of  the  stationary  MA (q)  inputs  is 


,<x>  = % ' 1 -2.  «■ 


, X t [-TT,  TrJ 


(4. 2-66) 


For  the  moving  average  inputs  we  can  obtain  an  estimate  of  the 
error  due  to  the  above  approximation.  For  MA(2)  process 

^ Trace  [HHT]  = ~ N 4 (N  - 1 ) + (N  - 2)  d (4.2-67) 

N N 1 2 


, , 1 .2  , . 2 .2 

1 + ' - N h * ' - N 


= 2 ,J  I1  +?i<:'1X+l2e'ia|  dA 


2ttO  J _ 
V 77 


.2  .2 
T 1 +C1  +42 


Error  = 


W'  •. 

T CJ  _ rr 


2ira  •/_ir 

P 


(X)  d X - — T race  [ ] 


‘-T,  < + 2«a 


From  the  constraint  region  for  (£j  , £ ) v/e  have  an  upper  bound 


for  the  error  e . 


Percentage  error 


(4.  2-72) 


vj 


I 


U 
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Wc  arc  interested  in  showing  that 


1 r -1-  T -T, 
- Tr[A  -E  A J 


2it a2  ^ 


(X)dX 


(4.  2-77) 


as  N 30  . 
process  - 


s (X)  is  the  spectral  density  of  the  stationary  ARMA 
uu  ’ 1 


s 

uu 


(X)  = o 


2 

V 


£ «, 

k-  1 


■ikX 


1 • £ 

k=l 


-ikX 


, X 6 [ - TT  , TT  ] 


(4.  2-78) 


The  results  summarized  below  are  obtained  from  Gray  [3  1]  and 
applied  to  the  optimization  problem  of  input  synthesis.  The  impli- 
cation of  these  results  are  also  discussed.  A complete  summary 
of  the  asymptotic  eigenvalue  properties  of  Toeplitz  matrices  is 
given  in  Appendix  E. 

Eigenvalues  of  the  product  of  nonsingular  matrices  is  un- 
affected by  a cyclic  rotation  of  the  matrices,  that  is 


-1  T-T  _T_1  T 

X[A  HE  A ] = X [ A A"  -H  ] (4.2-79) 

This  is  clear  because  the  above  operation  is  like  affecting  a simi- 
larity transformation  under  which  the  eigenvalues  are  invariate. 

Now  wc  can  write 
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T - 1 T 


an=a-'a-‘ssi.-(aaT)-1hst 


N 


(4.  2-80) 


The  matrix  is  of  the  form 


A = A A 
N 1 2 


(4.  2-81) 


T . T 


Aj  = AA  . A --  HH 


(4. 2-82) 


T 1 


J 


N 


is  not  a Toeplitz  matrix.  But  we  have  seen  that  the  eigenvalues 


of  Aj  and  A2  can  be  approximated  by  the  Toeplitz  matrices 
T |f(x)  and  T |g(x)j  respectively,  where 


f (X)  = I 1 - O', 


-ikx 


, x € [-TT,  tt] 


k=  1 


and 


(4. 2-83) 


g(x)  = 1 1 - j]  4 


■ ikx 


x € [ -tt,  it] 


k = l 


(4. 2-84) 


We  use  the  following  theorem  of  Gray  [31]  to  show  that  the  trace  of 
^2  ^2  approximates  the  desired  integral. 


aLh.cor5:1Tl  4.  2.  5_  Let  f(x)  and  g(x)  belong  to  class  L with 
f(x)  > 0 everywhere.  Then 


lim  I T 
N-® 


g(x) 


- 1 


fM 


T 


and  the  eigenvalues  of  T g(x) 


buted  as  g (x)  / f(x)  , x € [-tt,  tt]  . 


- T 
- 1 


g (x ) / f (x) 


= 0 


(4.  2-85) 


f(x) 


re  asymptotically  distri- 


ct! 


1 


n 


rs 


Corolla  rv  -1.2.  1 Het  and  M be  Hcrmitian  matrices 

approximating  T^g]  and  T [f]  respectively.  Then 

lim  | L M'1  - T [g/f]  | =,  0 (4.2-86) 

N-.CO 

63 

The  proof  of  the  above  results  are  given  in  Gray  [3!].  Those  results 
can  be  applied  to  the  matrix 


an  = (AAT)-1(HH)T  = Ai-1A2 


(4.  2-87) 


We  know  from  results  of  sections  (4.  2.  2)  and  (4.  2.  3)  that  the 

T p 

matrices  AA  and  — can  be  approximated  by  Toeplitz  mat  rices 


f(x)  and  T 


g(x) 


Thus  from  the  theorem  (4.  2.  5)  and  corollary 


(4.  2.  1)  v/e  have 


lim  ( A j 1 A - T [g/f]  I = 0 (4.2-88) 

N~*  “ 

Hence  the  eigenvalues  of  A?  are  distributed  asymptotically  as 
g(x)/f(x)  , x 6 ff]  . It  is  also  shown  by  Grjiy  that  by  a similar 

argument  as  above  if  g(x)  > 0 everywhere  then  the  eigenvalues  of 

A -1 

A are  distributed  asymptotically  as  f(x)/g(x),  x £ [-tt,  tt]  . Thus 
if  either  f(x)  > 0 or  g(x)  > 0 everywhere  then  corollary  4.  2.  1 gives 
the  asymptotic  beheivior  of  A^  or  A^  respectively . The  result 
foi  the  ARMA(p,  q)  inputs  is  summarized  in  the  following  theorem. 
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At  the  optimum  value  J the  spectral  curve  must  be  shaped 
in  sucli  a way  tiiat  the  cost  function  J is  maximized  and 


.tt  , N-l 


~ ~ ( ( y j H cosnXjs  (X)  d X 

17  J \m0  n / UU 


(4. 2-96) 


when  a is  fixed  the  optimum  J is  obtained  for  which  the  effec- 
tive spectral  area  of  the  input  process  is  also  maximized  in  the 
sense  of  (4.  2-94).  In  this  case 


1 2 

lim  — J = X (Q)  a ' sup  — 
. _ N max  V . \ 2 

N-1”  (a,£) 


V/ 


dX  (4.2-97) 


The  results  are  studied  numerically  in  the  next  section  where 
different  input  sequences  are  compared.  The  assumption  of 
s (X)  > 0 is  satisfied  by  all  the  inputs  considered  in  this  section, 
and  is  a necessary  and  sufficient  condition  for  the  basic  Toeplitz 
theorem.  This  condition  is  also  referred  to  as  "persistently 
exciting"  by  Mehra  [62]  and  Aslrom  and  Bohlin  [6].  This  condition 
makes  the  input  correlation  matrix  positive  definite  and  is  a 
sufficient  condition  for  the  maximum  likelihood  estimates  to  be 
efficient  [6],  Here  we  see  that  this  condition  is  being  satisfied 
naturally  in  the  development  of  the  asymptotic  properties  for  the 
input  processes  generated  by  linear  models  discussed  in  this  section. 


The  conditions  stated  by  earlier  investigators  is  embedded  in  the 


results  derived. 
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x-  ^ The  circuit  diagram  for  generating  the  ARMAfp,  q)  inputs 


! : 


can  be  interpreted  as  the  u passed  through  an  autoregressive 

K 


filter  and  the  noise  V passed  through  a moving  average  filter  in 


cascade.  Figure  (4.2-2)  shows  this  block  diagram. 


4.3  Numerical  Results 


In  this  section  we  will  present  two  examples  of  single -input 
and  single-output  system.  Comparison  of  the  different  inputs 
obtained  by  the  stationary  models  is  made.  The  system  is  repre- 
sented by 


k = E Vk-i  V'k-i 


+ w 


k 


(4.3-1) 


i = l 


i-  1 


Wk  ~ ° ^ 


(4.  3-2) 


We  consider  first  order  and  second  order  systems  obtained  from 

(4.3-1). 


(i ) Second  Order  System 


+ <Vk-2  ,p  Vi  lwk 


(4.3-3) 


The  system  has  the  transfer  function 


C(z) 


P. 


-1 


. -1  -2 

1 - y.  - z 


(4. 3-4) 


The  frequency  response  is  given  by 


G(f)  1 = 


, -iZ~F  -i47rf 

1-G,e  -P2c 


Simplifying 


G(f)  - 


2 . 2 


. If!  ^ i 


1 +Pj  + P2  - 20j  (1  -<p  ) cos  2 it f - 20  cos  4-f  2 


f 


(ii)  First  Order  System 


>'k  = u>Vi  + Pvi  *"-k 


Frequency  response  is  given  by 


-i2iTf 


G(f)  = 


. I f I 2 k 


l - o 


Simplifying 


I G(f)  1 = 


1 + p - 2<p  cos  2tTf 


Only  input  processes  up  to  second  order  are  considered.  The 
theoretical  spectrum  of  these  give  processes  is  given  below. 
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(i)  AR(2)  Process: 


u.  “ ai  u,  , + 0A  u,  + w 
k 1 k- 1 2 k - 2 


(4.  3-8) 


Spectrum 


su(f)  = CTp  (!  + aj  + a2  ' 20f  i (1  ' a 


) cos  2irf  - 20  cos  4tt£ 

^ 2 i 


f|s£  (4.3-9) 


(ii)  MA(2)  Process 


"k^rPt.i  -Vu 


Spectrum 


Bu(f)  ~ °l/  ( 1 + £ 2 + 4 2 ■ 2 £ | n - t-J  COS  Z-nl  - 2 cos  4n\f j (4.3-11) 


(iii)  ARMA(  1,1)  Process 


“k^iVP'V  *1  Vi 


(4.  3-12/ 


Spectrum  is 


_ 1 + £ - 2 £ cos  2irf 

su(I)  2 • lflsi 

1 -1  Ofj  - 20f^  cos  2tt[ 


(4.  3-13) 


(iv)  A R (1 ) Process 


“k  =ai  Vi 


(4.3-U) 
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The  spectrum  is 

EuW  = % (1  f °1  ' 2°1  COs  2Trf)  [f(si'  (-1.3-15) 
(v)  MA ( 1 ) Process 


u,  - P,  - £ V 
k k s 1 k- 1 


(4.3-16) 


The  spectrum  is 


su(f)  = °l  {1  + *1  ' 2^“l  cos2^)-  Uls£  (4.3-17) 

I he  maximization  of  the  cost  function  with  respect  to  the  parameters 
requires  these  to  satisfy  certain  constraints  such  that  the  stability 
and  invertibility  of  the  input  processes  is  ensured.  The  constraint 
regions  are  as  follows: 


AR(1)  Process:  J or ^ | < 1 

MA(1)  Process:  |£j  [<  1 


AR(2)  Process: 


“2  +“,<  1 


a _ a < i 

2 1 


|a2  ' < 1 


(4.  3-18) 
(4. 3-19) 


(4.  3-20) 


The  constraint  regions  for  AR(2),  MA(2)  and  ARMA(i,  1)  are  given 
in  figures  (4.3-1)  and  (4.3-2). 

In  the  following  examples  the  parameters  have  three  discrete 


values  with  equal  probabilities  as  follows: 


*1 

2 

P 

1.  5 

-.  7 

1. 

(actual  values) 

1.  1 

-.  4 

. 6 

1.6 

-.  8 

1.  1 

m 


Under  the  assumption  that  the  input  process  is  stationary  the 
optimal  inputs  are  computed  for  the  above  five  input  models.  The 
input  par  ameters,  optimal  cost  functions  and  the  theoretical  effective 
spectral  areas  are  compared  in  tables  (4.  3-  1 ) - (4 . 3-3).  Table 
(4.3-3)  gives  results  when  the  O value  for  the  first  order  system 
is  changed  from  I to  .1  . Optimization  for  the  second  order 
system  is  made  by  taking  P as  a known  parameter.  Figures 


n 


o 


(4.  3-3a)-(4.  3-3f)  show  the  frequency  response  of  the  second  order 
system  and  the  spectrum  of  input  processes.  Figures  (4.  3-4a)~ 

(4.  3-4f)  arc  the  frequency  response  for  the  first  order  system  with 
p - . 7 and  figures  (4.  3 -5a) -(4.  3-5f)  correspond  to  p - - . 7 . The 
autocorrelation  of  the  optimal  AR(2)  inputs  for  the  second  order 
system  and  the  ARMA(1,  1)  inputs  for  the  first  order  system  for 
p = .7  and  p = - .7  are  shown  in  figures  (4.  3-6)-(4.  3-8)  respec- 
tively. When  the  autocorrelation  function  slowly  decays  exponentially 
we  see  that  the  spectrum  of  the  process  is  dominated  by  low- 
frequencies.  This  results  in  the  neighboring  points  of  the  input 
sequence  being  similar  and  then  exhibits  marked  trends.  In  contrast, 
when  the  autocorrelation  function  alternates  in  sign,  the  spectrum 
is  dominated  by  high  frequencies  and  the  input  sequence  tends  to 
oscillate  rapidly. 

Figure  (4.3-9)  shows  the  system  response  and  input  spectrum 
of  AR(2),  MA(2),  ARMA(1,  1)  processes  for  the  second  order  system. 
It  is  seen  here  that  the  power  of  the  input  spectrum  is  concentrated 
at  frequencies  where  the  system  gain  is  also  high.  Thus  we  see  a 
matching  of  the  frequency  responses.  This  tuning  is  caused  due 
mainly  to  the  fact  that  the  parameter  space  is  restricted  to  a small 
region  surrounding  the  actual  parameter  values.  The  same  is  also 
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true  for  the  first  order  system  as  evidenced  by  figure  (4.  3-10). 


A032  737 


UNCLASSIFIED 


CALIFORNIA  UNIV  SAN  DIEGO  LA  JOLLA  DEPT  OF  APPLIED  M— ETC  F/G  12/2 
SYNTHESIS  OF  INPUT  SIGNALS  IN  PARAMETER  ESTIMATION  PROBLEMS. (U) 

1975  B R UPADHYAYA  AF-AF0SR-2839-75 

AFOSR-TR-76-1175  NL 


3 if- 

IS 3273 7 


167 
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The  frequency  corresponding  to  the  peak  gain  is  given  by  [15,  p,  60]. 


cos  2iTfp  = 


For  the  second  order  system,  = 1.  5 , <p  = - . 7 , Henc< 


fQ  = .0738  Hz 


For  the  optimal  AR(2)  input,  <^  = 1.8,  a =-.95,  and 


fQ  = . 0628  Hz  . 


The  learning  performance  for  the  second  order  system  with 
different  random  inputs  is  compared  in  table  (4.3-4).  Comparison 
is  also  made  with  pure  white  noise  as  the  input.  These  results 
correspond  to  a total  experimental  length  of  N = 20  . AR(2)  inputs 

give  the  best  parameter  learning;  the  moving  average  signal  being 
the  least  favorable.  Comparison  of  the  cost  functions  indicates  that 
the  autoregressive  and  ARIVfA  inputs  perform  better  than  the  moving 
average  inputs.  The  optimization  problem  is  easy  for  input  process* 
up  to  second  order.  Beyond  this  the  constraint  region  becomes  very 
complicated  and  the  optimization  requires  expensive  computation. 

It  is  pointed  out  earlier  that  the  optimal  input  spectrum  is 
such  that  there  is  maximum  transfer  from  the  input  to  the  output 


at  frequencies  where  the  cor  responding  gains  match.  In  order  to 
investigate  if  this  is  the  result  of  tuning  the  parameter  sets  close 
to  the  actual  values  of  the  system  parameters  experiments  are 
conducted  with  the  number  of  discrete  parameter  sets  m = 3,  5, 

10  and  15.  The  parameter  sets  are  chosen  to  include  the  stable 
region  of  the  second  order  system.  For  the  same  input  process  it 
is  seen  that  the  learnings  with  different  m do  not  show  marked 
difference.  Hence  it  is  not  possible  to  correlate  the  number  of 
parameter  sets  m and  the  learning  performance.  Figure  (4.3-11) 
shows  the  spectrum  of  AR(2)  inputs  and  the  frequency  response  of 
second  order  system.  This  indicates  that  as  the  parameter  set  is 
reduced  in  size  the  optimal  input  spectrum  gradually  matches  the 
system  gain.  No  general  conjecture  can  be  derived  from  these 
regarding  a specific  behavior  of  the  input  process.  However,  this 
suggests  a sequential  approach  to  the  design  problem.  This  can  be 
obtained  by  successively  reducing  the  parameter  space  after  each 
experimental  stage.  As  the  parameter  space  is  reduced  successively 
the  best  input  spectrum  will  emerge  eventually  as  the  optimal 
spectrum.  Further  theoretical  and  computational  analyses  are 
needed  to  explain  the  above  observations. 


• 

Table 

4.  3-1 

Second  Order  System 

1 

f s (X)  dX 
J uu 

— TT 

Input 

Process 

Optimal 

Cost 

2ttC2 

V 

Input  Parameters 

AR(2) 

3401. 3 

69.  33 

“l  =1*  8*  a2  = "•  95 

MA(2) 

1524. 6 

3.  113 

Cj=-1.  l.£2=-.95 

ARMA(1,  1) 

2146.6 

12.676 

=.85,  =-.95 

AR(1) 

2008.  1 

5.  263 

= . 95 

MA(  1 ) 

967.  7 

1. 903 

4,  =-95 
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Table  4.  3-2 

First  Order  System,  O = -.7 

1 


Input  Optimal  ^2 

Process  Cost  P -tt 


/ 


s (X)  dX 

uu  Input  Parameters 


AR(2) 

500. 03 

8.  276 

7,  ccz=.  25 

MA(2) 

204.  8 

3.  113 

^1  = !.  1.  £2  = -.95 

ARMA(  1 , 1) 

507.  3 

38.  026 

«1  = -.  95,  £ j=.  95 

AR(1) 

496.  2 

10. 256 

<*!  = -. 95 

MA(1) 

146.  1 

1.908 

^=•95 

J 
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Input 

Process 

Optimal 

Cost 

1 f 

y 1 s (X)  dX 

Zna2  J 

V -TT 

Input  Parameters 

AR  (2) 

500. 03 

8.  276 

V-7.  V-25 

MA(2) 

204.  8 

3.  113 

1.  42=-.95 

ARMA(1,  1) 

507.  3 

38. 026 

«!=•  95,  «2  = -.  95 

AR(1) 

496.  2 

10. 256 

=. 95 

MA(1) 

146.  1 

1.  903 

Sj  =-.95 

attfl  - Wt 

IWPlIWi 


n 


FIGURE  4.2-2 

Block  diagram  for  generating  ARMA(p.q)  inputs. 
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FIGURE  4.  3-3c  and  4. 


c)  Spectral  density  of  MA(2)  inputs 


3 - 3d 


d)  Spectral  density  of  AR MA ( 1 , 1)  inputs. 


FREQUENCY, HZ  FREQUENCY,  HZ 


FIGURES  4.  3-4c  and4.3-4d 


c)  Spectral  density  of  MA(2)  inputs 

d)  Spectral  density  of  ARMAfl,  1)  inputs. 
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FIGURE  4.3-7 

Autocorrelation  function  of  „pllmal  AItMA(1.  „ inpuU 
lor  the  first  order  system,  p = ,7< 


C 


■ i 


FIGURE  4.3-10 


Comparison  of  the  frequency  response  of  first  order 
system  (p  = . 7)  and  the  spectral  densities  of  AR(2), 
MA(2)  and  ARMA(1,  1)  inputs. 


Frequency  response  of  first  order  system 
Spectral  density  of  AR(2)  input 


Spectral  density  of  MA(2)  input 


Spectral  density  of  ARMA(1,  1)  input. 


CHATTER  V 


DESIGN  OE  FEEDBACK  SIGNALS 


The  optimal  inputs  obtained  in  Chapter  3 do  not  make  use  of  the 


measurement  information 


The  inputs  obtained  this  way  are 


called  open-loop  inputs  and  have  a mapping  of  the  form 


UkL  = UkL[^  ’ B N kr  °’  l’ N"  1 


(5.  1) 


where  ^ is  the  prior  information  about  the  system  parameters  and 
the  state,  and  B is  the  optimality  criterion.  In  this  chapter  we  will 
consider  feedback  signals,  that  is  signals  that  arc  generated  with  the 
knowledge  of  measurements.  In  particular  we  consider  open-loop 
feedback  signals.  At  each  stage  k,  the  future  inputs  are  obtained  by 
taking  another  measurement  and  proceeding  successively.  As  we 
will  see  the  feedback  is  not  always  active  when  the  unknown 
parameters  are  contained  only  in  the  gain  parameters.  For  the 
linear  Gaussian  case  the  interpretation  is  obtained  as  the  sum  of 
open-loop  input  at  stage  k and  a correction  term.  The  idea  of 
open-loop  feedback  controllers  was  introduced  by  Dreyfus[23] 
and  since  then  many  investigators  have  applied  this  control  policy 
(which  is  better  than  the  open-loop  policy  but  not  as  efficient  as  the 
closed-loop  policy)  in  the  solution  of  control  problems[S,  77  ]. 
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Lopez-Toledo[56]  used  feedback  signals  generated  from  affine 


laws  of  the  form 


uk  =^Ei(k)2k-i 4 Mk) 


(5.  2) 


where  g.(k)  and  h(k)  are  deterministic  quantities. 

In  Section  5.  1 v/e  review  classification  of  optimal  inputs. 
Section  5.  2 presents  the  synthesis  of  open-loop  feedback  signals. 
Sections  5.  3 and  5.  4 discuss  the  characteristics  of  feedback  inputs 
and  comparison  with  the  open-loop  inputs.  An  example  is  presented 
in  Section  5.  5 illustrating  the  sysnthesis  of  feedback  inputs. 


5.  1 Classification  of  Feedback  Inputs 


Fet  the  measurements  upto  time  k be  denoted  as 

k 

Zk^Zk(Uk'1)^|zi|  - lfik-N  (5.1.1) 

The  a priori  information  on  the  state  and  parameters  is  given 


*i{p(0),  p^)}- 


Depending  on  the  amount  of  information  used  by  each  type  of  input 
sequence  we  have  the  following  ramification. 


(i)  Open-loop  Inputs;  The  input  is  given  by  the  following  mapping- 


u-  = u-  Bn]  . k-0,  1 N-l 


mil 


" ■ ■ ■ ' 


The  inputs  are  defined  based  on  the  a priori  information  only. 


(ii)  Fc  cdback  Inputs:  Let  the  present  time  be  denoted  by  k.  Let  us 

k - 1 k 

assume  that  the  input  U has  been  applied  and  the  output  Z ' has 


been  observed.  We  want  to  determine  the  future  inputs  u.  based  on 
observations  upto  time  k.  This  open-loop  feedback  input  has  the 


following  mapping- 


UkLF  = UkLtf  -k’  y.k_1  ;0;  bn  ] . k=0,  1 N - 1 (5.  1-4) 


This  assumes  that  no  future  observations  will  be  made  and  that  the 
measurements  upto  time  k are  used  in  the  computation  of  present 
and  future  inputs. 


(iii)  Closed  -loop  Inputs:  This  signal  anticipates  the  fact  that  future 
learning  is  possible,  that  is,  the  knowledge  that  the  loop  will  stay 
closed  through  the  end  of  the  process  is  fully  utilized[78].  These 
inputs  have  the  mapping- 


CL  OL-  k k-1  t 

uk  =uk  V- 


\0  \ Bn  ] ,k=0,  1 N-l  (5.  1-5) 


where  £.  represents  the  future  observation  program  and  the 

K.  t IN 

associated  statistics.  For  a complete  discussion  of  various  control 
policies  in  stochastic  control  problems  sec  [9].  In  dual  control 
problems  and  nonlinear  problems  the  dependence  of  the  controller 
on  the  future  observation  program  comes  into  play  because  the 


covariance  appearing  in  the  expression  for  the  optimal  cost-to-go  is 


a function  of  the  control  itself. 
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In  the  current  discussion  of  the  input  synthesis  problem,  the 
dynamic  programming  formula4  ion  of  the  closed-loop  problem  shows 
that  the  closed  form  solution  of  the  problem  is  not  feasible.  Hence  we 
resort  to  the  open-loop  feedback  formulation  of  the  constrained 
optimization  problem.  Computationally  this  method  is  much  easier. 


5.  2 Synthesis  of  Open-loop  Feedback  (OLF)  Inputs 

The  closed-loop  optimal  policy  has  the  following  characteristics: 

k k-  1 

(i)  It  uses  the  knowledge  cf  the  observations  (Z  , U ) upto  the 
present  time  k. 


(ii)  It  takes  into  account  the  future  observation  program  £ and 

k,  N 

its  associated  statistics.  Thus 


uk  = uk  (Zk,  ) , k=0,  1 N-l  (5.2-1) 

The  set  of  a posteriori  probabilities  P(.  9 | Z ),  i=l,2 m;k=l,2, 

. ..,N-1  form  a sufficient  statistic  for  the  closed-loop  problem  and 
the  optimal  sequence  can  be  derived.  The  resulting  solution  is  c 
complicated  for  the  foloowing  reasons: 


(i)  The  functional  formulation  of  the  dynamic  programming  equation 
is  difficult  to  solve;  the  constrained  maximization  involved  in  solving 
this  equation  makes  the  resulting  optimization  quite  complicated. 

(ii)  The  implementation  of  the  closed-loop  inputs  requires  the  exact 

k 

knowledge  of  the  a posteriori  probabilities  P(.9|Z  ),  i=l,2,...,m; 
k=  1 , 2,  . . . , N- 1 . The  successive  optimizations  will  result  in  input 


sequences  that  are  nonlinear  functions  of  the  a posteriori  probabilities. 
Because  of  this  a closed-form  solution  is  not  feasible. 

We  use  the  approximation  given  by  the  open-loop  feedback 
approach,  detailed  below. 


Let  Uk,N  A |u  ,u  u I 

MVk  + i un-i| 


'L.  & U.k+1 


(5.?.-  2) 


(5.2-3) 


Denote  the  current  time  by  k . Let  us  assume  that  the  input  sequence 

k-l*  . ,k 

U has  been  applied  and  the  corresponding  observation  z 

1 i!i=i 

is  obtained.  Required  is  the  determination  of  future  inputs 
k N 

U ' based  on  the  information  at  time  k . Denote  the  optimization 
after  k stages  as 


max  m- 1 m 


v».  E E 6 [Vi2-£s’“‘  -Uk-N)lzl] 

k ki-l  j = i+l  l~-'- 


(5.2-4) 


This  can  be  written  as 


m- 1 m 


BN-k=ucU  2 2 P(:£|zk)P(i0|zk)  B..(.0,.0,Uk"1  . 

V u 1 J lJ  l~  j~  ~ 


k k i = 1 j = i+  1 


Uk'  N ) 


(5.2-5) 
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The  optimization  carried  out  by  (5.2-5)  gives  the  following  sequence 
of  inputs 


u jk,  k=0,  1,...,  N-l  , j=k,k  + l,...,  N-l  (5.2-6) 

The  optimum  OLF  sequence  results  from  (5.2-6)  by  choosing  those 
inputs  for  which  j=k 


r\ 


r 


•j. 

Uk  = Uk[k’  k"°’  1 N_1  (5.2-17  ) 

The  feedback  is  employed  in  the  sense  that  at  each  stage  of  the  open- 
loop  synthesis  an  observation  is  made,  thus  introducing  feedback.  In 
the  next  section  we  will  study  the  characteristics  of  OLF  inputs. 

5.  3 Characterization  of  Feedback  Signals  for  Linear  Systems 

For  the  linear  system  of  (5.  1-1)  and  (5.  1-2)  the  feedback  inputs 
can  be  interpreted  as  the  sum  of  open-loop  signal  and  a correction 

term.  Let 


w.  ~ G(0,Q)  (5.3-1) 

k 

vk~  G(0,  R ) (5.3-2) 


Using  the  results  of  Section  (3.2),  at  stage  k the  OLF  cost  can  be 
written  as 


m - 1 m 

E E 

i=l  j=i+l 


,k-l 


»N-k  * E E > iV-lz- 1 Vr ' • » 


(5.3-3) 


2M 


Using  (3.  2-20)  this  can  be  written  as 


m - 1 m 

Vk*  5 £ t r>(  e |zk)  P(  e |zk)  • 

i-  1 j = i + 1 J 


• (Uk-'",  uk’N,T  [m:1 


1..I*  „k.N 


1 r1  M..](uk‘1<'-  u 


lJ  lJ  IJ- 


Hi-  i iri 

Ck  = 4 Z Z P<i®.lz-k)  P(.0lzk)[MT  E~.!  M..1 

1 J L ij  ij  ijJ 

i=  1 j-i+1 

is  NXN  positive  definite  matrix.  Then 


[uk-:  *■«  f fc 


c u* 

12, k - 


C22,k  - 


Now  this  can  be  written  as 

Vk*  -k  ,t  Cn,k  Sk  '*  + yk’NT  c2z  k Uk,N  + 2 Uk,NT  cj2 


with  the  constraints 
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W “„©•••©  Vi*  {'ikNMlukNll  -vk-v- 


The  necessary  condition  for  optimality  is  given  by 


(5. 3-8) 


c„  tuk-N=  XukN+c7,  ,uk-1 

22,  k - - 12,  k - 


(5.3-9) 


^*22  k *S  (^-M  * (N-k)  symrnet  ric  positive  definite  matrix  and 

X > 0.  (5.3-9)  is  a nonhomogeneous  eigevalue  problem.  Now  a set  of 

N-k 

(N-k)  orthonormal  vectors  e (R  can  be  obtained  from  the  eigen- 

k “ 

N-k 

vectors  of  ^ * thus  spanning  R . In  a finite  dimensional 

vector  space  this  forms  a complete  set  of  orthonormal  vectors  [4,p  467] 
and  hence  the  solution  to  (5.  3-9)  can  be  written  as 


k,  N AT"' 

U = > a e 

- i - il 


(5. 3-10) 


where 


C22,k-ik  Xi  -ik  ’ ‘ “ lf  2'  * ' • ' N-k 


(5. 3-11) 


<3ik—  jk^>  ij 


(5. 3-12) 


Using  (5.3-10)  - (5.  3-12)  we  have 


<®iU  ’ ^k> 

0,  = , X * X.  , i = 1,2 N-k 

1 X -X. 


(5. 3-13) 
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where 


v Trk- 1 T N-k 

\ = C 1 2 , k — CH 


(5. 3-14) 


The  constant  X is  o^ch  that  the  resulting  solution  must  satisfy  the 
constraints  (5.  3-8).  (5.  3-10)  can  be  written  as 


,Tk,  N 

y-  =ai^ik  + 


N-k 
1 = 2 


:ik 


(5. 3-15) 


3 


where  e^j_  is  the  normalized  eigenvector  corresponding  to  the  largest 
eigenvalue  X^  of  ^ . £.jj_  can  ke  considered  as  the  solution  to 

the  open-loop  problem  at  stage  k . Rewriting  (5.3-15)  as 

N-k 


U 


kN 


ai  [-lk  4 S a — ik] 
i=2  1 


(5. 3-16) 


The  term 


N-k 


U 


kN 


D ^ 


■ik 


i=2 


(5.3-17) 


is  a correction  to  the  open-loop  inputs  e . Thus  we  can  write 

1 K 

kN^ 

U as 


, kN 


U 


U™  ‘ 4 UkN"‘ 
“O 1 — c 


(5. 3-18) 


The  effectiveness  of  the  feedback  inputs  depends  on  the  correction 

* 

kN  . . 

term  U . Initially,  for  a few  stages,  the  feedback  inputs  follow 


2 1 7 


win 


a..  = P(.0  |zk)  P(.0  |zk)  where  .0*0  Y-. 

1J  1 “ J — L — 0 1 


(5.  3-19) 


become  small  compared  to  the  terms 


a = P(.e|  7k)  P(0  I Zk)  where  .0  * 0n  V. 

lO  i—  —0  i—  0 i 


(5. 3-20) 


Then  a.  >>  a.,  and  the  discriminant  function  B . multiplying  a 

io  lj  ij  r > h iO 

will  dominate  those  multiplying  a..  . This  shows  that  the  distance 
functions  B..  which  do  not  contain  the  right  parameters  will  have 
minimum  effect  on  the  cost  function.  This  is  the  major  change 


the  open-loop  signal  and  then  the  correction  comes  into  effect. 
Towards  the  end  of  the  sequence  the  energy  of  the  inputs  becomes 
small  and  the  difference  between  feedback  and  open-loop  inputs 
reduces.  Thus  the  difference  in  the  two  signals  is  dominant  during 
the  stages  that  are  away  from  the  initial  and  final  stages  of  the 
signal  history. 

The  feedback  enters  the  optimization  via  the  a posteriori 
i k 

probabilities  P(.0JZ  ) , i=l,2,...,  m and  k=l,2,...„  N ; and 

j. 

1 *»•  ^ i* 

the  past  inputs  U in  the  form  of  the  vector  C , _ , U . Note 

1 2,  k — 

that  as  the  learning  approaches  the  correct  parameters,  that  is, 
P(.0|zk)  — 1 for  0_  = 0_o  , the  remaining  probabilities  become 
small.  Thus  the  cross  terms 


introduced  by  the  feedback  synthesis,  which  is  the  desired  one.  To 


L 


summarize,  the  feedback  inputs  arc  synthesized  by  vising  all  the 
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available  past  and  present  information.  The  optimal  cost  function 
reduces  the  effect  of  distance  measures  that  do  not  contain  the  true 
set  of  parameters,  thusmaking  it  a function  of  distance  B(9_  . 0) 
where  0^8.,  as  one  would  desire. 

l—  —0 


5.4  Comparison  of  Open-Loop  and  Feedback  Signals  When  the 
Unknown  Parameters  are  Contained  in  the  Gain  Vector 

Here  let  us  consider  the  case  where  the  unknown  parameters  are 
contained  only  in  the  gain  vector  . This  is  equivalent  to  the  transfer 
function  model  with  known  poles  and  unknown  zeros. 

Consider  a scalar  system  as  follows: 


xk  + i +f!uk 


*k " *k + vk 


v.  ~ G(0,C  ) , k- 1 , 2 N 

k v 


(5.4-1) 

(5.4-2) 

(5.4-3) 


The  pairwise  distance  is  given  by 


B = 4[ZN-ZN1  [ZN-ZN1 
‘J  2 L~l  - J J , 


(5.  4-4) 


v/here 


■ 


- ~ 


(5.4-5) 


pn~'s.  pn-23.  . ..'.I, 

i i i 


Remiting  B. . 

ij 


(£  -/3  ) T 

' i l TT  N- 1 . T . N- 1 

— 2 — Li  a au 

a 

V 


where 


A = 1 


N - 1 N-2 

P p ....  1 


Consider  the  feedback  optimization 


U 


?.zo 


m-  1 


m 


B. 


N-k 


= max 


x £ T ) P(.s|y.  )(8-/?.j 

i — j i j 


u c \j  i=  1 j = i + 1 
^ k 


kc 


J 


Nk 


(Uk-1<,.UkN)TATA(Uk-1<.UkN) 


(5.4-8) 


This  shows  that  the  maximization  does  not  depend  on  the  past 
observation  prograni,  the  solution  coincides  with  the  open-loop  case. 
Thus  for  the  case  with  scalar  gain  the  solution  to  the  OLF  problem 
coincides  with  the  solution  for  the  open-loop  synthesis  since  both 
maximizations  are  given  by 


un-iTataun-1 

U CU 


(5.4-9) 


c 


When  the  unknown  gain  8 C R , the  separation  of  the  cost  function 
as  in  (5.4-9)  is  not  possible.  This  can  be  shown  by  considering  the 
following  system  as  in  Section  (4.  1-2).  Let 


m 


xk‘E  hVi4 Z hVi  • k=1’2 N 

i- 1 i=l 


x.  + v » k=l,  2,  . . . , N 


*k  ~ G(0'  ak  ] 


(5.  4-11) 
(5.4-12) 


f i 


1 


J 


The  cost  function  (4.  1-29)  becomes 

, T 


B..  = UN_1  B*  ..  $>"*  R'1  O'1  H ..  UN_1 

ij  ~ N,  ij  N N N N,  ij  — 


The  feedback  optimization  is  given  by 


B 


N 


m - 1 m 

^ = max  E E p(j«Iz  ie(.s|zk  ) 
Y»k  1=1  j=i+1 


ke^. 


Nk 


(5.  4-13) 
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(Uk~*  ,UkN)  b'J.  ..4-"1  R-»n  ..  (Uk_1  , UkN 

— — N,  ij  N N N N,  ij  — — ) 


k=  1,2,...,  N-l 


(5. 4-14) 


1 


Now  it  is  clear  that  the  inputs  U are  a function  of  the  observation 
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5.5  Numerical  Example 


W e consider  the  second  order  example  given  by 


x = O,  x -t  O x + jS  u, 
k 1 k- 1 l k-2  k*  1 


(5.5-1) 


\ = xk  4 vk 


vk~G<°,l.)V. 


(5.5-2) 


(5. 5-3) 


This  example  is  considered  in  Chapter  3 to  study  open-loop  synthesis. 

Figure  (5.5-1)  shows  the  open-loop  and  feedback  inputs  for  N = 20  . 

The  feedback  input  has  a smaller  attenuation  overall,  compared  to  the 

open-loop  signal.  Figure  (5.5-2)  is  a plot  of  the  a posteriori 

I k 

probabilities  P(9q[Z  ) with  open-loop  and  feedback  inputs.  The 
transient  response  with  the  feedback  signals  is  much  smoother  than 
with  open-loop  inputs.  The  small  improvement  in  learning  v/ith 
feedoack  synthesis  is  obtained  at  a much  higher  cost,  the  feedback 
syntheis  requiring  approximately  four  times  as  much  computation 
time  as  the  open-loop  case. 

The  improvement  in  learning  achieved  with  feedback  identification 
will  vary  v/ith  the  type  of  system  and  the  problem  must  be  analyzed 
individually.  The  procedure  of  computing  the  feedback  lav-  is 
completely  illustrated. 
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FIGURE  5.  3-1 

Open  -loop  and  feedback  inputs  for  the  second  order 
system. 

Feedback  inputs 

Open-loop  input. 


FIGURE  5.  3-2 

Comparison  of  a posteriori  probabilities  P(£  | ) for 

the  open-loop  and  feedback  inputs. 

P(_0  I ) with  feedback  inputs 

I k 

P(0  | z ) with  open-loop  inputs. 
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CHAPTER  VI 

SUMMARY  AMD  COMCI.USIONS 

A summary  of  the  salient  features  of  the  dissertation 
presented  in  earlier  chapters  is  given  below.  Some  concluding 
remarks  and  possible  extensions  and  suggestions  for  future  research 
are  contained  in  section  (6-2). 

6.  1 Summary 

It  was  pointed  out  that  the  system  identification  can  be  treated 
as  the  initial  phase  of  a general  control  problem.  Before  the  control 
policies  are  implemented  a fast  and  efficient  identification  phase  is 
preceded  to  determine  the  unknown  parameters  of  the  system.  Such 
a procedure  will  avoid  solution  to  a costly  and  time  consuming  dual 
control  problem.  This  leads  to  the  question  of  selecting  optimal 
input  signal  sequence  so  that  some  measure  of  learning  is  minimized. 

The  solution,  set  in  an  adaptive  filtering  framework,  computes  the 

. k 

a posteriori  probability  of  the  parameters,  P(9^  | _Z  ) , recursively. 
This  is  a sufficient  statistic  containing  the  information  of  the 
measurements  about  the  unknown  parameters.  Distance  functions 
representing  the  affinity  of  distributions  characterized  by  distinct 
parameter  sets  are  defined  and  used  as  optimality  criteria. 
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In  Chapter  2 the  pairwise  Bhattacharyya  distance  is  introduced 

and  several  of  its  properties  arc  studied.  It  was  shown  that  the 

B-distancc  is  monotonically  nondecreasing  as  a function  of  time. 

The  important  ordering  property  of  this  measure  with  respect  to 

the  probability  of  error  (in  the  sense  of  minimum  Bayes1  risk)  is 

given  by  theorem  (2.3-2).  The  distance  measures  arc  shown  to  be 

N 

closely  related  to  the  mutual  information  I(Z  ; 0^)  . The  divergence 

and  B-distance  form  upper  bounds  on  the  mutual  information.  Using 

this  fact  a lower  bound  for  the  mean  square  error  E (9  - 0 ) is 

derived  from  Shannon's  rate  distortion  theory. 

The  Bhattacha ryya  coefficient  can  be  interpreted  as  the  cosine 

of  the  angle  between  two  points  on  a unit  hyper  sphere  with 

p(Z  I £a  ) ? and  I £y  ) "2  being  the  d irection  cosines 

of  the  two  points.  Using  this,  the  differential  metric  B^  Q ±j\c)  is 

expressed  in  terms  of  the  Fisher  information,  being  the 

perturbation  of  0 . Thus  B„  ~ , , Q is  a function  of  the  sensi- 

— A_0 

tivities  of  the  observations  with  respect  to  the  system  parameters. 
The  B-distance  is  applied  for  the  first  time  in  this  thesis  as  a 
measure  of  signal  selection.  It  is  interesting  to  note  that  these 
discriminant  functions  which  originally  appeared  in  the  context  of 
statistical  data  analysis  have  a natural  extension  in  application  to 
more  general  dynamic  systems.  Furthermore  (he  mutual 
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information  can  be  replaced  by  the  easily  computable  distance 

functions  in  practical  applications. 

In  Chapter  3 solution  to  the  input  synthesis  problem  in  linear 

dynamic  systems  is  presented.  The  linear  system  is  described  in 

terms  of  the  state  space  equations  and  the  distance  function  is 

derived  for  the  Gaussian  noise  statistics.  It  was  shown  that  this 

quantity  is  quadratic  in  the  inputs  and  the  maximization  problem 

under  energy  constraint  has  a global  solution.  The  optimal  input 

vector  is  given  by  t he  eigenvector  of  a symmetric  positive  definite 

matrix  corresponding  to  its  maximum  eigenvalue.  The  calculation 

of  eigenvectors  and  eigenvalues  requires  a computational  time  that 

increases  as  the  square  of  the  order  of  the  matrix.  A simple 

gradient  technique  is  derived  to  compute  the  optimal  input  sequence. 

For  the  Bayesian  learning  scheme  it  was  shown  in  theorem  (3.3-1) 

that  for  a given  positive  integer  N , C > 0 (5  sufficiently  small), 

there  exists  a value  of  the  input  signal-to-noise  ratio  such  that  for 

all  n > N , p (^9  | Z )>1-C  . This  gives  a sufficient  condition 

for  the  estimation  error  to  be  less  than  a given  fidelity  criterion. 

The  learning  performances  with  inputs  obtained  from  the 

B-distance,  divergence  and  pulse  inputs  are  compared.  The  results 

2 

show  that  as  a function  of  the  plant  noise  covariance  <7  , the 

w 

inputs  obtained  by  the  B-distance  perform  much  better  than  that 


obtained  by  the  divergence  measure,  the  difference  increasing  as 
2 

a function  of  CT  , This  conclusion  agrees  with  the  studies  of 
w 

Grettcnberg  [3-1]  and  Kailath  [41]  in  the  selection  of  communication 
links-  The  open-loop  inputs  derived  in  Chapter  3 can  be  approxi- 
mated by  low  order  difference  equations  of  the  form 

\ ii  hVi 

i = l 

In  the  numerical  examples  presented  in  section  (3-5)  second,  third 
and  fourth  order  least  square  fits  (i.  e-  s = 2,  3,  4)  are  obtained 
for  first  and  second  order  systems.  These  are  shown  graphically 
and  the  corresponding  distance  functions  are  compared  with  the 
optimal  value. 

In  Chapter  4 an  alternate  approach  to  the  signal  selection 
problem  is  presented.  In  particular  we  consider  generating  these 
inputs  from  white  noise  using  linear  stochastic  processes  of  the 
autoregressive  moving  average  type.  These  inputs  are  used  in 
single-input  single-output  systems.  To  study  the  properties  of 
these  signals  the  sensitivity  index  is  used  and  is  shown  to  be 
related  to  the  Bhattacharyya  distance  for  the  stochastic  inputs. 

The  properties  of  autoregressive,  moving  average  and  mixed  process 
inputs  are  studied  using  the  asymptotic  properties  of  Toeplilz 
matrices.  The  input  correlation  matrix  is  approximated  by 
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appr°priate  Toeplitz  forms  aiul  using  the  results  on  asymptotic 

eigenvalue  distribution  of  Toeplitz  matrices  it  is  shown  that  when 

2 

the  variance  of  the  input  white  noise  o ^ is  fixed,  among  the  class 
of  all  admissible  input  processes,  the  input  process  with  the  maxi- 
mum cost  function  also  has  the  largest  value  for  the  integral 
(4.2-11).  Thus  an  optimal  spectral  shape;  is  obtained.  The  resulting 
problem  gives  rise  to  a reduced  optimization  in  the  space  of  the 
process  parameters.  A condition  to  be  satisfied  for  the  above  result 

to  hold  is  that  the  spectral  density  s (X)  > 0 , This  is  indeed  true 

u 

in  all  the  linear  input  processes  considered  above.  Numerical 
results  show  that  the  autoregressive  AR(2)  and  ARMA(1,  1)  inputs 
perform  better  than  the  moving  average  MA(2)  inputs.  Generation 
of  these  inputs  requires  only  a few  delay  lines  and  unlike  the  binary 
sequences  used  by  earlier  investigators,  the  input  sequence  is  much 
more  smoother  in  its  variation. 

Synthesis  of  feedback  inputs  is  the  topic  of  Chapter  5.  Open- 
loop  feedback  techniques  are  used  to  incorporate  observation 
information  in  the  input  sequence.  It  is  shown  that  at  any  stage  k, 
the  feedback  input  can  be  represented  as  the  sum  of  open-loop  signal 
at  stage  k and  a correction  term.  The  feedback  becomes  ineffec- 
tive when  the  unknown  parameter  is  contained  only  in  the  gain  term. 
Numerical  results  indicate  that  the  improvement  in  learning  gained 
using  feedback  inputs  is  small  compared  to  that  with  open-loop  inputs. 
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6.  2 Concluding  Remarks  and  Suggest  ions  for  Future  Research 


The  problem  of  identificatioii  and  control  is  a subject,  of 
interest  in  many  applications  such  as  chemical  industry  processes, 
nuclear  reactor  control,  aerospace  applications,  etc.  These  dual 
control  problems  require  efficient  solutions  where  time  and  cost  of 
operation  is  important.  It  is  suggested  here  that  the  dual  control 
can  be  redefined  as  consisting  of  two  phases  - the  identification  phase 
and  the  control  phase.  To  accomplish  the  first'  it  is  necessary  to 
design  input  sequences  that  are  constrained  in  their  total  energy  and 
we  use  all  the  available  information  about  the  system,  so  that  the 
identification  phase  will  use  only  a short  time  span  of  the  total 
operation  time.  The  methods  developed  and  presented  here  provide 
a promising  approach  to  the  solution  of  these  problems. 

The  Bayesian  approach  to  the  identification  problem  requires 
a proper  quantization  of  the  parameter  space  <o  t(.  1 tl 
parameter  set  is  also  included  in  this  dissert  ti  n.  'ft  . it  i. 
necessary  to  develop  methods  to  obtain  efficient  qu  nti  • . n of  the 
parameter  space.  A dual  problem  of  input  design  is  the  design  of 
measurement  subsystems.  In  practice  this  corresponds  to  the 
design  of  sensor  locations  in  test  set  ups.  The  combined  design 
of  input  and  measurement  system  is  another  area  where  further 


investigation  is  necessary. 


In  section  (?..  6)  we  derived  an  expression  for  the  differential 


metric  n in  terms  of  the  Fisher  information.  Use  of  this 

a , U i A 0 


quantity  as  a criterion  function  is  worth  further  study.  Another 


important  area  of  signal  synthesis  application  is  in  distributed 


parameter  systems.  These  belong  to  the  category  of  thermal  and 


chemical  processes  and  nuclear  reactors  (see  Kerlin  [45]  ),  where 


quantities  such  as  temperature  coefficients  of  reactivity  and  heat 


transfer  coefficients  are  to  be  determined. 


Further  theoretical  and  experimental  investigation  is  necessary 


to  completely  characterize  the  stochastic  inputs  discussed  in 


Chapter  4.  It  is  an  interesting  problem  to  study  under  what  con- 


ditions the  system  frequency  response  and  the  input  process 


frequency  response  match. 


The  feedback  identification  discussed  in  Chapter  5 showed  that 


the  synthesis  of  inputs  via  the  open-loop  feedback  technique  did  not 


significantly  improve  the  parameter  learning  over  the  open-loop 


synthesis.  This  will  depend  on  the  type  of  system  and  the  experiment 


length,  and  further  work  should  be  carried  out  to  study  the  effect  of 


optimal  closed-loop  inputs  and  to  classify  systems  whose  performance 


may  improve  with  feedback  inputs. 
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APPENDIX  15 


Bhattacharyya  Distance  for  Gaussian,  Gamma , 
Poisson  and  Mult  inominal  Dist  r ihut ions 

I.  Multivariable  Gaussian  Distribution: 


P(ZNl®a>  - G (Ea  , PJ  . P(/1N|Gy)  = G (ty  , Zy) 


.N, 


B 


ay 


1(ivvt£ 


-1 


V + 2 In 


del  E 


jdet  E^  dct  E^ 


(13-1) 


) 


E=  (E  +E  ) . 

2 a y 


2.  Gamma  Density 


b + 1 

a 


p(zl0a)  = - — 


°ay  f 


r(b  4i) 
' a 


b 4- 1 b + 1 
a y 
c c 

a y 


a 


r b 4 1 b 4 1 

„ y 

c c 

__a_ y 

r (b  t i)i'  (b  4i) 

' a y 


- c 

a 

Z 

U(Z)  , 

b > 
a 

0 , 

i 

2 

b 4 b 

1 

a y 

(c 
v a 

2 

Z 

e 

/b 

4 b 

\ 

i ( a 

_ y.s. 

>) 

A 

2 

b 

4 b 

a 

y 

4 1 

2 

c 

K ) 

a 

y( 

a 


dz 


(B-2) 


244 


Li 





) 


APPENDIX  C 


Numerical  Method  for  the  Solution  of  Energy 


Constrained  Inputs 


When  the  number  of  stages  N is  large,  determination  of  optimum 
inputs  requires  finding  eigenvectors  of  high  dimensional  matrices. 

This  requires  a computation  time  which  increases  as  the  square  of  the 
matrix  dimension.  In  order  to  circumvent  this  a numerical  scheme  is 
presented  for  the  energy  constrained  signals.  This  method  is  similar 
to  the  method  of  gradient  projection  [67]. 


C.  1 Geometric  Representation  of  the  Problem 


Consider  the  two  dimensional  quadratic  cost  function 


1 T T 

f (u  1 , u ) = — u Q u + a_  u 


2 2 _ 
with  constraint  u^  + u s F 


(C-l) 


(C  - 2) 


We  want  to  maximize  f (u^u^)  subject  to  the  constraint  (C-2). 
The  geometric  aspects  of  the  problem  are  shown  in  Figure  (C-l). 


Let  be  the  point  where  an  initial  cost  contour  intersects  the 

constraint  boundary.  Then  A is  the  gradient  of  constraint  at 


D (y  c)  and  DB  is  the  gradient  of  cost  function,  Vf  . From  the 


figure  the  angle.  6 ranges  from  0 to  — . These  correspond  to 


2.-17 


maximum  and  minimum  direction  coincidence  of  the  two  gradients; 


when  0 ~ 0 we  are  at  an  extreme  point  of  th  <■  function. 


Consider  a direction  OB  that  makes  an  angle  8 <6  with  OA  . 
OB  intersects  the  cost  gradient  Vf  at  B . The  vector  OB  is  given 
by 

OB  = uQ  4 kVl(uQ)  (C-3) 

When  OB  is  scaled  down  to  OE  we  get  the  new  point  K which 
satisfies  the  constraint. 


u = OE  = OB 


\TT 

im 


(C-4) 


The  problem  now  is  to  determine  k such  that  the  new  point  E is  as 
close  to  the  extremum  as  possible. 

C.Z  The  Method  of  Solution 

The  angle  8 between  the  gradient  vectors  Vc  and  Vf  is 
calculated  as  follows 


cos 


e = 


( V £ ■ Vi) 

"ivc  H ii^ir 


(C-5) 


Choose  8 <8  . The  new  direction  OB  is  obtained  as 


OB  = tu  = Uq  + k v_f  (u  ) 


(C  -6) 


The  step  size  k is  given  by  the  solution  of  the  following  relationship 
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coj  B = 


<uo-;° 


(-lVuo  + k !%)> 
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(C-7) 


In  (C-7)  all  the  quantities  are  known  except  k that  can  be  obtained  by 
a simple  solution  of  a quadratic  equation  as  follows. 

Let  cos  B - fv  . Then  from  (C-7) 
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Simplifying  (C-9)  gives 


?r  K"0  • vi>i2  ? 

k ~zf~7z III2 

a IIuqII 


v -1  ' + li~z  ^0  ’ vi>  - 2 <u  n ’ v f 

J Lcx  U 0 “J 


+ M2(~rj- 1 ■ 0 
' <*  ' 


Equation  (C-10)  is  of  the  foj 


(C-10) 


a k + b k + g = 0 


(C-ll) 
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Scale  down  u'  such  that  u •!  1 - - C • 

“l  t 1 ~ i |i  / || 

— i + i!l 


Step  4.  Set  i = i + 1 . Compute  V f and  V c 

— i — i 


lf  llVi-  - v£.I!<  c , stop. 


Step  5.  Go  to  Step  2. 


The  convergence  of  the  above  procedure  is  guaranteed  because  of 
quadratic  cost  function  and  convex  constraint  region.  The  value  of 
t c (0,  1)  is  selected  after  some  experimentation.  In  the  examplesof 
Section  (3.6)  t = .8  gave  satisfactory  convergence  rate.  Note  that 
the  use  of  penalty  functions  is  not  necessary  for  the  method  outlined 
here.  Table  (C-l)  gives  the  convergence  of  the  algorithm  for  the 
example  with  N = 20. 


Tabic  C - 1 


Convcr^i’m  u of  th 

c Quad  rat  k Mu: 

•: i i i xat  ion  J-’rubliMii 

v/ith  Kncrcv  Constr 

a int , N - 20  . 

Second  Order  System 

ITERATION 

e 

COST  FUNCTION 

1 

6.  35911E-01 

17. 8537 

2 

6.  35872E- 0 1 

40.  0499 

3 

3.  1 7605  E- 01 

6S. 9881 

4 

1. 35247E-01 

77.  5470 

5 

6.  56942E-02 

79.  2250 

6 

4.  17047E-02 

79. 6626 

7 

3.  1 9 1 95 E- 02 

79. 8563 

8 

2.  60 1 7 1 E- 02 

79. 9742 

9 

2.  15181E-02 

80.  0533 

10 

1. 78435E-02 

80.  1076 

1 1 

1 . 47978E- 02 

80.  1449 

12 

1. 22680E-02 

80. 1706 

13 

1. 01675E-02 

80. 1883 

14 

8.  42464E- 03 

80. 2004 

15 

6.  97936E- 03 

80. 2087 

16 

5.  781 34PJ-03 

80.  2144 

17 

4.  78858E-03 

80. 2184 

18 

3.  96607E-  03 

80. 2210 

19 

3.  28474E-03 

80. 2229 

20 

2.  72036  E- 03 

80. 2242 

21 

2.  25291E-03 

80. 2250 

22 

1. 86577 E- 03 

80.2256 

n 
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APPEXD1X  I) 


Proof  of  Theorem  3.  3-1 


The  a posteriori  probability  of  ()  - ^_G  can  be  written 


PI;  ®lz  ) = 


N P<Z  I/®)  I’; 


m 


E p(-N' !;-*  Pj 


i-  1 


Assuming  equal  prior  probabilities  this  can  be  written  as; 


p(^e|zN) 


m 


1 + E p(-NiL-) 

i*  i 

P (ZN|,  0) 


For  C sufficiently  small  we  want 

P(^elzN)  > 1 - e 

Let 

m 

« = E p(2?NU«) 

i * -t 

P(ZNb®) 

Then 

p(^  o IZN)  = (1  + 6 )'  1 


a s 


(D-l) 


(D-2) 

•i 


(D-3) 


(D-4) 


(P-5) 
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(D-8) 

Dividing  (D-8)  bv  (m-1)  and  1 1 s i n fr  t Vi  r»  a v 1 f \ 1 ry^  fit!/'  « i ? _ 


25f 


Taking  logarithms  on  both  sides  of 


. . 2 rr 

l±l  k=l  "k 


The  above  inequality  should  also  hold  for  the  mean  values  of  the 
respective  random  variables.  Averaging  with  respect  to  we  get 


A 

In  6 - In 


i m ( N 

E -E  -V  (h': 


E -E 

i *1  k=l  ' k 


For  the  system  described  by  (3.  3-16)  and  (3.  3-17)  the  RHS  can  be 
written  as 


l V*  -1-  /hT  UT  X2 

Z Lj  2 (-  l -k  ' h i *k} 


i iiuN‘1ii2 


k=l  ak 


T - 1 
M , . R M. . 
vi  N vi 


where  M.  . - M . i=l m . ill 


/n-  7 


2r>9 


M.  = hf  o.  * hT  p 

> ~ 1-1  - — i 


is  (N  X N)  matrix 


, T N - 1 „ T N-2  „ 
h p.  3 h O S 

1 -“l  — l - i 


• hT  P. 


Using  (D-12)  in  (D-ll)  it  follows 


From  (D-6)  we  require 


Combining  (D-16)  and  (D-17) 


( D - 1 4 ) 


2 2 2 

Rn  = diagonal  (cr^  , or,,  , O _ ) 


(D- 15) 


A m 2 

In  6 > In  (in-1)  = — ~ — V E liuN'1|l 

^ mT.k-'mJ. 

-Cl  IM  l i 


(D-16) 


In  6 < In  C , for  given  e > 0 


(D-17) 


In  ( m - 1 ) - — • - - - 
2 (m-  1 ) 


E liu 


U.N-1 


M \ . M, 
l\  N <t 


< In  e 


(D- 18) 
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Rearranging  wo  got 


— - y u3 

fm-  l\  JL*t  ' - 


2 (m-  1) 


T - 1 
M,.  Rm  M 
i N <i 


, m - 1 

In  “ * , m '■  2 


The  I, IIS  in  (D-19)  is  directly  proportional  to  the  IS  NR  . This 

quantity  is  a norm  with  respect  to  a symmetric  positive  definite 

N 1 ^ 

matrix.  Denoting  this  norm  by  !'u  ' j!  we  can  use  an  orthogonal 

X ]\T 

transformation  T ( R which  does  not  alter  the  length  of  the 


Thus 


?.  T 

„N-1  _T  T N-l 

I = U T T A T t U 


(D  - 20) 


Now  T A T 


iagonal  (X^  , X^  , . . . , X^)  where  X.  are  the 


eigenvalues  of  A matrix.  So  we  have 


T 

,N- 1 T 


T At  U 


= Z xiVi  • \ > 0 <D-2l> 


where 


UN-'  = Tu"-1 


(D- 22) 


Now  it  is  clear  that  ||tJ  |j  = jjtJ  |j  and  increasing  !!u^  3 j| 

■ i N - 1 1 1 2 

will  also  increase  | |U  ||  . From  the  definition  of  ISNR  in 

— A 

(3.3-19),  increasing  the  ISNR  will  also  increase  the  norm  in  (D-19). 
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APPENDIX  E 


Toeplitz  Matrices  and  A sympl otic 
Eigenvalue  Distribution 

This  appendix  presents  the  main  results  on  the  theory  of 
asymptotic  eigenvalue  distribution  of  Toeplitz  matrices.  The 
important  theorems  of  Grenander  and  Szego  [33]  and  some 
generalizations  and  related  theorems  by  Gray  [31,  32]  are  given  in 
this  appendix  without  proof.  The  interested  reader  is  referred  to  the 
above  references  for  a complete  expose'. 

Let  f(x)  be  a real-valued,  continuous,  intergrable,  bounded 
fuction  on  [-  ] such  that 


V 

dxlnf  (x)  > 


00 


(E-l) 


so  that  f (x)  > 0 almost  everywhere.  Let  us 
functions  as  L . 

Definition  E-l:  The  N X N Toeplitz  matrix 


refer  to  this  class  of 


T is  defined  as  follows 


N 


= * r*kji  ■ ^ f 


IT 


f(x)  e 


‘|k-j  U 


(Lx 


(E-2) 
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o 


The  matrix  T is  a positive  definite  matrix  and  its  eigenvalues 
N are  strictly  Positivo  for  all  N . 


Definition  E-2:  Consider  the  nonnegative  sequences  Ja  ! and 

I Nj 


l3k  n|  ' Assume  that  f°r  each  k and  N 


r 


Sk,N<R-  bk.N<R  (E-3) 

where  R is  independent  of  k and  N . The  sets  ’a  J and 

I k,  N ) 

|bk  N j are  said  to  be  asymptotically  equally  distributed  as  N-—°= 


if 


N-»  S T [r(nk>N)  - F(bk>N)l  = 0 


(E-4) 


k=l 


where  F(t)  is  an  arbitrary  continuous  function  on  [0,  R]  . 

The  following  definitions  describe  the  concept  of  one  matrix 
approximating  another  matrix. 


G) 


Definition  E-3:  L-et  L_.  be  an  N X N Mcrmitian  matrix  with 

— — N 

entries  LN(ifj)  and  eigenvalues  N>  k=l,2,...,  N.  The  strong 
norm  of  I,.,  is  defined  as 

N 


III. 


max 


N 


k N 


(E-5) 


m 


I nuuu  \.A  \wmmmnrnm 


264 


x t> 


The  weak  norm  of  L is  given  by 


M-lai:  e i^<u»rj 


j[ 

2,  2 


i=l  J = 1 


N 


(!,E 


2(  2 


I 


N 


k,  N ( 


k=l 


(E-6) 

0 


c 


Definition  E-4:  We  say  that  two  sequences  of  Hermitian  matrices 

exhibit  mutual  approximation  denoted  by  ~ G if  they  possess 
the  following  properties: 


(i)  Uniform  boundedness  in  both  norms,  that  is 


|lnII  , ||gnI|,  |ln|  , |gn|  * m < » 


lim  |l  - G I = 0 
N N 


(E-7) 


(E-8) 

Now  we  state  the  basic  theorem  on  the  asymptotic  eigenvalue 
distribution  of  Toeplitz  matrices  (see  [31,  theorem  5.2,  p 64]). 


Theorem  E-  1 : (Asymptotic  eigenvalue  distribution  of  Toeplitz 

matrices).  Let  f (x)  be  a real-valued  function  of  class  L . Denote 
by  m and  M the  essential  lower  and  upper  bounds  of  f (x) 
respectively,  and  assume  that  m and  M are  finite.  If  F (X)  is  any 
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♦ 

continuous  function  defined  in  the  finite  interval  m ^ X s M vc  have 

N n 

lim  iD  F|Xk,Nl=  zV/  F[f(*)]dx 

N " k=J  -V 


(E-9) 

0 


The  next  theorem  is  a slight  modification  of  Theorem  7.4  of  Grenander 
and  Szego  ([33],  p 105).  This  is  the  ba  sic  theorem  for  finding  the 
asymptotic  eigenvalue  distribution  of  approximately  Toeplitz  matrices. 


Theorem  E-2:  Given  a Toeplitz  matrix  L^T  and  a Hermitian  matrix 

Gn  such  that  Ln~Gn.  Let  }«k  N)^  and 

the  eigenvalues  of  L^T  and  G^  respectively.  If  either  lim  |l^| 


II  ( 

or  lim  G.J  is  finite  then  (O'  J 

' N / K, N 

' )k=  1 

asymptotically  equally  distributed. 


,N 


N 


ana  K«! 


are 


m 


The  following  theorem  of  Gray  gives  a necessary  and  sufficient 
condition  for  the  inverse  of  a Toeplitz  matrix  to  itself  approximate 
the  Toeplitz  matrix  T^  [ 1 / f (x)]  . 

Theorem  E-3;  Let  f (x)  C L and  f (x)  - m > 0 . Then  the 
eigenvalues  of  T^  [f  (x)]  are  asymptotically  distributed  as 
J/f  (x),x.  ( [-A,  . Furthermore 


.-1 


T"  [f  (x)]  ~ T [1/f  (x)] 


(E- 10) 


that  is.  T [f  (x)]  approximates  a Toeplitz  matrix. 

0 

Corollary  E-l:  Let  f (x)  > 0 everywhere  and  A^  ~ T [f]  where 

A.t  is  bounded  and  Uermitian.  Then 
N 

AN  ~TN[l/fl  (E-ll 

and  from  Theorem  E-2  the  eigenvalues  of  A ^ are  asymptotically 
distributed  as  1/f  (x)  , x f [-TT,  n]  . 

0 

Corollary  has  the  interpretation  that  if  a matrix  approximates  a 
Toeplitz  matrix  generated  by  f (x)  , then  the  inverse  approximates 
another  Toeplitz  matrix  generated  by  1/f  (x)  . 


Theorem  E-4:  Let  f (x)  and  g (x)  belong  to  L . Then 

Hm  |T  [f]  T [g]  - T [f  g]|  = 0 (E-12 

N-*~“ 

and  the  eigenvalues  of  T^  [f]  T fg]  are  asymptotically  distributed 
as  f (x)  g (x)  , x C [-tf.ff]  . 

CD 

Corollary  E-2:  Let  the  bounded  Hcrmitian  matrices  A__  and  E 

N N 

approximate  T^  [f]  and  T [g]  respectively.  Then 

Nir»|ANRN-  TN[fs11  = 0 <E‘13! 


and  the  eigenvalue  of  A B arc  asymptotically  distributed  as 


r 
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